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ABSTRACT 


The  USAF  is  trying  to  identify  a  vocoder  to  insert  into  a  Low 
Probability  of  Intercept  (LPI)  communi cations  system.  It  should  be 
small,  light  weight,  low  power,  and  capable  of  processing  intelli¬ 
gible,  natural  sounding  speech  at  400  to  600  b/s.  Two  separate 
systems  are  needed:  one  to  be  utilized  soon  in  a  brassboard  system  to 
test  the  LPI  concepts  and  one  to  be  available  as  mid  90s  off-the- 
shelf  hardware  for  a  production  LPI  system. 

Weighted  characteristic  values  are  combined  through  a  mapping 
and  summing  procedure  to  form  a  Figure  of  Merit,  Fs>  for  comparing  the 
systems.  Each  characteristic  has  a  minimum,  below  which  the  system  is 
considered  unacceptable.  Thirty  eight  current  systems  or  research 
efforts  were  identified.  Of  these,  only  17  were  determined  to  be 
available  in  the  desired  time  frame.  These  separated  into  2  groups: 
mid  80s  available  and  mid  90' s  available  systems.  The  systems  in 
each  category  were  compared.  The  3  with  the  highest  Fs  values  were 
identified  as  the  primary  candidates.  For  the  mid  80s  effort  the 
optimum  systems  are  all  Motorola  prototypes.  They  are:  (1)  the 
Miniaturized  Narrowband  Secure  Voice  System,  (2)  the  Manpack  Vocoder, 
and  (3)  the  Advanced  Technology  Model  Multi -Rate  Processor  LPC 
Vocoder.  For  the  mid  90s  effort  none  of  the  systems  met  all  of  the 
minimum  requirements.  The  desired  data  rates  and  equipment  sizes  have 
not  been  combined  in  a  single  effort.  More  R&D  funds  are  necessary  to 
advance  the  development  of  vocoders  to  the  desired  stage. 


Director  of  Graduate  Instruction 
College  of  Engineering 


PREFACE 


The  work  reported  in  this  thesis  was  performed  at  the  United 
States  Air  Force  Avionics  Laboratory,  a  center  for  research  operated 
by  the  Air  Force  Wright  Aeronautical  Laboratories  (AFWAL)  with  the 
support  of  the  Department  of  the  Air  Force. 

The  views  and  conclusions  contained  in  this  document  are  those 
of  the  author  and  should  not  be  interpreted  as  necessarily  repre¬ 
senting  the  official  policies,  either  expressed  or  implied,  of  the 
United  States  Air  Force  or  the  United  States  Government. 

The  value  judgements  of  the  specific  equipments  presented  in 
this  paper  are  a  result  of  the  author's  choice  and  structuring  of  the 
evaluation  criteria.  The  results  of  the  comparisons  are  not  intended 
to  and  do  not  reflect  the  quality  or  operational  competance  of  any 
product  discussed. 
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Abstract 


The  USAF  has  a  need  to  identify  a  vocoder  to  insert  into  a  Low 
Probability  of  Intercept  (LPI)  communications  system.  It  should  be 
small,  lightweight,  low  power,  capable  of  operating  in  many  types  of 
aircraft,  and  capable  of  processing  intelligible,  natural  sounding 
speech  at  400  to  600  bits/ seconds.  Two  separate  units  are  needed: 
one  to  be  used  in  a  near-term  brassboard  test  system  and  one  to  be 
used  in  a  far-term  production  system.  Weighted  characteristic  values 
are  combined  through  a  mapping  and  summing  procedure  to  form  a  Figure 
of  Merit  for  each  system.  Using  these  characteristic  values,  primary 
vocoder  candidates  have  been  identified  and  are  discussed  in  this 
paper.  .  <"'<7  “I,  As. 
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CHAPTER  I 


INTRODUCTION 


Problem  Statement 

The  United  States  Air  Force  is  investigating  the  feasibility 
of  implementing  a  Low  Probability  of  Interception  (LPI)  communications 
system.  The  objective  of  this  report  is  to  provide  an  independent, 
non-government  determination  of  what  the  current  level  in  vocoder 
technology  is  and  to  evaluate  it  for  application  to  the  USAF  LPI 
Communications  Techniques  Advanced  Development  Program  (LPI  Comm  ADP). 

The  objective  of  the  LPI  Comm  ADP  is  to  develop  from 
"off-the-shelf"  technology  a  flight  qualifiable  brassboard  system  for 
advanced  development  testing  in  the  late  1980s.  The  brassboard  deve¬ 
lopment  is  aimed  at  a  mid-1990s  production  of  a  multimode 
LPI /Anti -Jam/Secure  Airborn  Radio  System  (LASARS). 

At  the  present  time,  a  conceptual  design  study  for  an  LPI  com¬ 
munications  system  is  being  performed  under  contract  for  the  Air 
Force.  This  study  is  examining  the  feasibility  of  utilizing  advanced 
device  technology  and  is  looking  to  integrate  potential  LPI  signal 
processing  techniques  into  the  LARSARS.  The  technologies  which  yield 
LPI  gains  and  are  under  consideration  include  spread  spectrum  modula¬ 
tion  techniques,  continous  adaptive  power  control,  speech  bandwidth 
compression,  adaptive  interference  suppression,  adaptive  beam  pointing 
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antennas,  low  side  lobe  antennas,  adaptive  frequency  control 
(mul tipi e- band  operations),  adaptive  null-steering  antennas,  and  adap¬ 
tive  signal  masking.  Figure  1-1  depicts  the  LPI  system  in  general 
terms  and  indicates  where  the  various  technologies  fit  within  the  pro¬ 
jected  system.  The  J's  and  I's  indicate  jamming  and  interference 
signals,  respectively.  The  ESM  block  is  the  electronic  support 
measures  of  the  jamming  and/or  intercepting  receivers. 

This  report  will  address  the  speech  bandwidth  compression  area 
of  the  ongoing  investigation  by  evaluating  vocoder  technology  appli¬ 
cable  to  both  the  1980s  brassboard  and  the  1990s  multimode  LASARS. 

The  report  will  be  utilized  by  the  LPI  Comm  ADP  program  manager  and  by 
the  conceptual  design  contractor  in  their  analysis  of  and  recommen¬ 
dation  for  an  overall  communication  system  design  integrating  the  LPI 
technology  areas. 

Scope 

This  report  provides  a  comprehensive  overview  of  current 
vocoder  production  systems  and  of  current  research  models  for  future 
implementations.  It  contains  discussions  in  several  areas.  These 
incl  ude: 

1.  A  description  of  the  speech  wave  and  the  problems  asso¬ 
ciated  with  characterizing  it. 

2.  An  analysis  of  LPI  gains  from  reduced  information  data 

rates. 

3.  A  summary  of  the  theory  of  operation  of  each  competitive 


vocoder  technology. 


LPI  TECHNOLOGY 


Figure  1-1.  LPI  technology  conceptual  design  areas. 
(Drawn  from  USAF/AFWAL  VU-graphs.) 
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4.  The  development  of  an  assessment  criterion  for  comparing 
and  selecting  the  most  satisfactory  vocoder  configuration  for  near-term 
brassboard  feasibility  tests  and  for  far-term  LASARS  implementation. 

5.  A  qualitative  comparison  of  low-rate  (narrowband)  vocoder 
technol ogies. 

6.  A  quantitative  comparison  of  low-rate  vocoder  tech¬ 
nol  ogies. 

This  report  does  not  include  any  development  of  new  vocoder  technology 
or  any  new  methods  for  analyzing  vocoders.  Its  purpose  is  strictly  to 
summarize  the  state  of  the  art  of  vocoder  technology,  including  all 
alternate  approaches,  and  specific  vocoder  parameters  and  to  select 
the  vocoder  candidates  best  suited  for  insertion  into  a  brassboard  and 
for  application  to  a  LASARS  development  and  implementation. 

Approach 

This  report  is  generated  from  an  extensive  literature  search. 
The  data  sources  include  a  periodical  search  and  a  government  and 
civilian  technical  report  search  conducted  by  accessing  Department  of 
Defense  (DoD),  Air  Force,  National  Aeronautics  and  Space  Administra¬ 
tion  (NASA),  and  various  civilian  report  documentation  data  bases,  by 
conducting  library  searches  in  three  different  libraries,  by  con¬ 
ducting  private  communications  with  engineers  at  various  research 
organizations  and  private  companies,  and  by  visiting  some  research 
organi zations. 

The  report  develops  the  mathematical  relationship  between  data 
bandwidth  and  the  required  transmitter  power  (Chapter  II).  This 


chapter  also  develops  an  understanding  of  the  problems  associated  with 
analyzing  the  speech  waveform  and  the  basic  theory  of  operation  of  the 
major  vocoder  methods.  A  method  of  assessment  for  comparing  and 
selecting  a  specific  vocoder  is  developed  (Chapter  III).  The  report 
performs  an  in  depth  qualitative  (Chapter  IV)  and  an  in  depth  quan¬ 
titative  (Chapter  V)  comparison  of  the  various  vocoder  systems  being 
produced  and/or  researched.  Finally,  it  presents  a  set  of  recommen¬ 
dations  (Chapter  VI)  concerning  the  best  vocoder  implementations  for 
brassboard  and  production  LPI  communications  systems. 


CHAPTER  II 


BACKGROUND 


Overview 

Before  discussing  vocoder  systems  and  implementations  it  is 
necessary  to  understand  the  concept  of  an  LPI  communications 
system/ channel  and  what  gains  are  to  be  achieved  in  this  channel 
through  the  use  of  vocoders.  Also  necessary  is  a  discussion  of  the 
speech  waveform  characteristics  and  how  they  must  be  approached  for  an 
in  depth  mathematical  analysis  and  characterization.  Finally,  the 
various  vocoder  methods  or  technologies  must  be  presented  to  form  a 
basis  for  examining  specific  implementations. 

Function  of  Vocoder 

The  vocoder  will  constitute  a  major  function  of  the  com¬ 
munications  link.  A  typical  digital  communications  system  block 
diagram  is  shown  in  Figure  2-1 (a).  The  vocoder  will  form  the  basis 
for  the  formatting/ source  encoding  portion  of  the  system. 

Figure  2-l(b)  summarizes  the  typical  functions  of  the  various  portions 
of  the  system.  Different  speech  digitizers  can  be  described  as 
follows  (1). 

Speech  coders  can  be  divided  into  two  broad  categories:  wave¬ 
form  coders  and  vocoders.  The  waveform  coders  attempt  to  mimic 
the  speech  waveform  as  closely  as  possible.  These  coders  are 
capable  of  producing  high-quality  speech  but  only  at  bit  rates 
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Figure  2-1.  Digital  communication  system;  (a)  typical  block 
diagram  and  (b)  basic  digital  communication  transformations. 

{Taken  from  ref.  54) 
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above  about  16  kbits/s;  the  speech  quality  deteriorates  signifi¬ 
cantly  as  the  bit  rate  is  lowered  below  10  kbits/s.  Vocoders  are 
a  parametric  model  of  human  speech  production  to  achieve  coding 
efficiency.  Vocoders  can  produce  intelligible  speech  but  often 
the  output  speech  has  a  synthetic  quality.  The  speech  quality 
from  vocoders  cannot  generally  be  improved  by  increasing  the  bit 
rate. 

The  vocoder  samples  and  quantizes  the  analog  speech  signal. 

It  then  performs  one  of  a  variety  of  mathematical  analysis  techniques 
in  order  to  characterize  the  speech  wave.  This  characterization  is 
then  output  in  digital  form  from  the  source  encode  portion  of  the 
system.  The  vocoder  directly  determines  the  basic  data  rate  of  the 
system.  Encryption,  when  utilized,  usually  does  not  add  to  the  data 
rate.  When  providing  data  error  protection,  the  channel  encoder  adds 
bits  proportional  to  the  incoming  data  rate.  The  output  power 
required  to  maintain  a  specific  signal-to-noise  ratio  is  directly  pro¬ 
portional  to  the  system  information  data  rate.  Therefore,  decreasing 
the  data  rate  will  allow  immediate  reductions  in  output  power  allowing 

the  system  to  operate  at  a  minimum  level  and  decreasing  the  detec¬ 

tability  of  the  communications  signal. 

The  system  data  rate  is  directly  proportional  to  the  system 
bandwidth  which  determines  the  necessary  power  output.  Reducing  the 
data  rate  allows  corresponding  reductions  in  the  bandwidth.  Reductions 
in  bandwidth  limit  the  input  noise  power.  If  the  noise  power  input  is 

reduced,  the  signal-to-noise  ratio  increases.  Therefore,  the  trans¬ 

mitter  output  power  can  be  reduced  in  order  to  maintain  a  given, 
required  signal-to-noise  ratio.  Again,  the  system  is  assumed  to  be 
operating  at  a  minimum  level  of  acceptable  communications.  It  is 
desired  therefore  to  reduce  the  data  rata  as  low  as  possible  while 


maintaining  an  acceptable  level  of  speech  recognition  and  quality  and 
acceptable  speaker  identification.  Discussion  of  the  specific  vocoder 
gains  is  presented  in  Appendix  Al. 
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Speech  Waveform  Considerations 

The  speech  waveform  presents  unique  problems  for  mathematical 
analysis.  It  has  three  general  characteristics  to  be  determined.  It 
consists  of  "voiced"  sounds,  "unvoiced"  sounds,  and  a  basic  pitch 
period.  Voiced  sounds  are  more-or-less  periodic  in  nature  and  are 
almost  the  same  in  shape  for  every  individual  speaker  with  some  slight 
differences  in  frequency  content.  Unvoiced  sounds  are  noise-like  in 
nature,  varing  only  in  frequency  content  and  amplitude.  The  pitch 
period  determines  the  basic  repetition  period  for  the  voiced  sounds 
and  is  a  function  of  the  vocal  tract  and  different  for  every  speaker. 
Figure  2-2  shows  samples  of  these  characteristics.  Figure  2-3  shows 
samples  of  the  frequency  content  of  these  two  speech  types.  Speech 
production  is  generally  characterized  as  the  convolution  of  a  given 
excitation,  pitch  pulses,  with  the  vocal  tract  impulse  response. 

The  speech  production  process  is  best  described  in  the 
following  quote  (99). 

The  human  speech  production  system  consists  of  an  air  pressure 
source  (the  lungs)  feeding  through  the  vocal  cords  and  combined 
oral  and  nasal  passages.  The  vocal  cords  can  be  caused  to  vibrate 
and  provide  a  periodic  (voiced)  excitation  to  the  vocal  tract,  or 
to  be  abducted  to  allow  airflow  into  the  tract.  A  constriction  in 
the  tract  during  a  period  of  airflow  can  cause  turbulence 
(friction)  noise  to  be  generated  just  downstream  of  the  constric¬ 
tion  (with  or  without  voicing).  The  oral  and  nasal  passages  form 
a  variable  configuration  deformable  set  of  resonators  connected  to 
the  excitation  sources.  Radiation  of  the  resulting  pressure  wave 
from  the  mouth  and  nose  causes  the  speech.  Although  the  sources 
are  nonwhite  and  the  radiation  process  is  frequency  dependent,  the 
model  can  be  simplified  by  combining  the  frequency  dependent 
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Figure  2-3.  Spectral  models  for  voiced  and  unvoiced  speech 
(Taken  from  reference  19,  page  3G.3.1) 


effects  into  a  single  filter  that  contains  both  poles  and  zeros, 
and  by  assuming  that  the  source  is  either  a  periodic  impulse  train 
or  white  noise. 

During  the  formation  of  a  word  or  syllable  several  of  these 
voiced  and  unvoiced  sounds  are  usually  combined.  This  means  that  the 
speech  wave  is  constantly  changing.  In  order  to  analyze  the  speech 
wave  it  is  normal  to  "window"  the  speech  signal.  A  specific  time 
segment  of  the  wave  is  chosen,  usually  around  20  milliseconds,  during 
which  the  speech  signal  is  considered  to  be  stationary.  Different 
analysis  methods  have  different  optimum  window  times  determined 
through  extensive  research.  A  discussion  of  this  determination  is 
beyond  the  scope  of  this  paper.  A  more  detailed  presentation  of  the 
characteristics  of  the  speechwave  can  be  found  in  Appendix  A2. 


Vocoder  Methods 

Speech  signals  are  known  to  be  highly  redundant.  The  most 
effective  method  to  reduce  this  redundancy  and  tnus  reduce  the 
channel  capacity  required  for  transmission  of  speech  signals  is  to 
extract  and  transmit  only  the  major  characteristics  of  speech 
at  a  regular  interval  (typically  10  to  30  ms).  These  charac¬ 
teristics  are  then  used  at  the  receiver  to  reconstruct  speech. 

A  system  based  on  this  method  is  commonly  known  as  an  analysis- 
synthesis  system.  (132) 

Analysis-synthesis  systems  are  also  known  as  vocoders.  These 
are  systems  for  analyzing,  parameterizing,  quantiti zing,  and  then 
resynthesizing  the  speech  waveform.  These  vocoders  constitute  a  class 
of  speech  bandwidth  compression  techniques  described  in  general  terms 
by  the  following  quote.  (99) 

Narrow-band  speech  compression  systems  generally  require  analy¬ 
sis  and  synthesis  of  speech  with  separate  characterizations  of  the 
vocal  tract  frequency  response  and  the  excitation.  These  systems, 
generically  termed  vocoders,  achieve  reasonable  quality  speech 
reproduction  at  rates  of  5  kbits/s  and  below  by  transmitting  quan¬ 
tized  parameters  of  this  source-filter  model.  This  bandwidth 


reduction  is  generally  achieved  at  the  cost  of  a  loss  of  voice 
intelligibility  and  naturalness.  In  addition,  vocoder  performance 
tends  to  be  talker-sensitive,  and  to  be  fragile  in  that  extraction 
of  source  and  vocal  tract  parameters  are  affected  adversely  by 
additive  acoustic  noise  and  signal  distortion  at  the  vocoder 
input. 

There  are  seven  major,  distinct  voice  analysis-synthesis 
methods.  These  are  the  channel,  formant,  homomorphic,  pattern¬ 
matching,  phase,  linear  predictive  coding,  and  spectral  envelope  esti¬ 
mation  vocoders.  Other  methods  exist,  none  in  production  or 
undergoing  extensive  research,  which  combine  aspects  of  the  various 
methods.  These  combination  methods  will  not  be  considered. 

Channel  Vocoder.  The  channel  vocoder  or  spectrum-channel 
vocoder  separates  the  speech  signal  into  12  to  28  frequency  channels. 
Contiguous  bandpass  filters  are  used  to  perform  this  separation.  Each 
channel  is  rectified  and  then  low  pass  filtered.  The  time-varying 
signal  then  represents  the  amount  of  signal  energy  in  the  given  fre¬ 
quency  range.  This  can  then  be  quantized  with  a  few  bits  and 
transmitted.  A  final  channel  consists  of  a  voiced/unvoiced  (V/UV) 
detector  and  a  pitch  extractor.  This  information  is  quantized  and 
transmitted  and  then  used  in  the  synthesizer  along  with  the  channel 
signals  to  control  the  frequency  response  of  a  time-varying  resonant 
filter  to  correspond  to  the  spectral  envelope  measured  at  the  ana¬ 
lyzer.  See  Appendix  A3. 

Formant  Vocoder.  The  formant  method  assumes  that  the  speech 
wave  can  be  characterized  by  an  envelope  consisting  of  several  promi¬ 
nent  maxima.  Below  3,000  Hz  there  are  usually  three  maxima  and  below 


4,000  Hz  there  are  four  or  five  maxima.  These  are  known  as 
"formants."  Here  the  analyzer  determines  the  frequency  location, 
bandwidths,  and  relative  amplitudes  of  the  individual  formants.  This 
information  is  coded  and  transmitted  to  the  synthesizer  which  uses  it 
to  control  resonances  of  a  formant  synthesizer  consisting  of  tuned 
resonant  circuits.  There  are  several  methods  of  determining  the  for¬ 
mant  frequencies.  One  method  is  to  measure  the  rate  of  axis  crossing 
of  filter  separated  formants.  This  method  is  fairly  inaccurate 
without  additional  adjustments.  A  second  method  is  to  "channelize" 
the  signal,  measure  the  amplitude  of  each  channel,  and  average  to 
determine  the  most  prominent  frequencies.  A  final  major  method 
involves  finding  the  average  of  the  derivative  of  the  time  signal  and 
dividing  by  the  average  of  the  time  signal  itself.  A  prominent 
vocoder  method  is  an  analysis-by-synthesis  vocoder.  It  is  one  form  of 
a  formant  vocoder.  It  generates  known  artifical  spectra  and  compares 
them  to  the  incoming  speech  wave  and  through  iteration  methods  these 
spectra  are  matched  and  the  known  characteristics  of  the  "artificial" 
signal  are  transmitted.  The  pitch  extraction  and  V/UV  decision  is 
made  the  same  as  in  the  channel  vocoder.  See  Appendix  A3. 

Homomorphic  Vocoder.  The  homorphic  or  cepstrum  vocoder  uti¬ 
lizes  an  FFT  approach  through  the  use  of  homomorphic  filtering  con¬ 
cepts.  The  convolved  speech  signal  is  transformed  into  a  spectral 
magnitude,  product  signal  by  a  high-resolution  Fourier  transform. 

This  is  transformed  into  an  addition  process  by  taking  the  logarithm 
of  the  spectral  magnitude.  This  yields  a  rapidly  varying  pitch  com¬ 
ponent  and  a  si  owing- varying  vocal  tract  component.  Now,  another 


Fourier  transform  (or  inverse  transform)  [95,104]  is  performed 
separating  the  signal  into  a  "low-time'*  component  containing  the  vocal 
tract  information  and  a  "high-time"  component  containing  the  excita¬ 
tion  or  vocal-cord  information.  Pitch  extraction  and  the  V/UV  deci¬ 
sion  is  determined  utilizing  the  high-time  information.  This  method 
is  popular  for  use  with  other  vocoder  methods  as  the  pitch  extraction 
and  V/UV  decision  determination  method.  See  Appendix  A3. 

Pattern  Matching  Vocoder.  The  pattern-matching  vocoder 
transmits  basically  three  items  of  information.  It  transmits  the 
memory  location  of  a  stored  spectral  pattern  which  most  closely 
matches  the  speech  segment,  the  pitch  information,  and  the  V/UV  deci¬ 
sion.  Sometimes  error  information  about  the  difference  between  the 
stored  pattern  and  the  speech  segment  is  sent  so  that  it  can  be  used 
to  adjust  the  pattern  recalled  from  memory.  This  increases  the  data 
rate  somewhat  and  most  designers  have  determined  this  information  to 
be  essentially  unnecessary.  Almost  any  speech  analysis  method  can  be 
utilized  to  transform  the  speech  signal  into  a  form  matching  those  in 
memory.  Windowed  speech  segments  are  again  used.  With  today's  high 
speed  computers  smaller  windows  are  used  allowing  more  comparisons  and 
more  accurately  synthesized  speech. 

Phase  Vocoder.  The  phase  vocoder  utilizes  a  channel  method  to 
generate  "short-time"  or  windowed  amplitude  and  phase  spectra  to 
represent  the  characteristics  of  the  speech  wave.  This  method  differs 
from  the  channel  vocoder  in  that  the  derivative  of  the  signal  phase  is 
determined  in  the  analysis  procedure  and  utilized  in  the  synthesis 
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procedure  to  regenerate  the  speech  signal.  With  this  method  no  pitch 
or  V/UV  information  is  needed  to  modulate  with  the  spectral  infor¬ 
mation  in  the  regeneration.  See  Appendix  A3. 

Linear  Prediction  Vocoder.  Linear  predictive  coding  vocoders 
operate  in  the  time  domain  rather  than  in  the  frequency  domain.  This 
method  uses  weighted  sums  of  a  given  number  of  past  samples  in  order 
to  predict  the  present  sample.  The  weights  form  the  adaptive  portion 
of  the  analysis.  They  are  adjusted  to  minimize  the  error  signal  bet¬ 
ween  the  actual  and  predicted  speech  samples.  The  system  transmits 
selected  characteristics  of  the  error  signal.  These  transmitted 
signals  include  predictor  coefficients,  gain,  pitch  information,  and 
the  V/UV  decision.  In  the  process  of  determining  the  predictor  coef¬ 
ficients  several  intermediate  sets  of  coefficients  are  formed.  Any 
of  this  information  can  be  transmitted  with  the  most  common  set  being 
the  reflection  coefficients.  See  Appendix  A3. 

Spectral  Envelope  Estimation  Vocoder.  A  final  major  vocoder 
or  analysis-synthesis  method  is  the  spectral  envelope  estimation 
vocoder.  This  method  is  the  most  recently  developed  technique.  In 
addition  to  generating  parameters  used  in  the  regeneration  of  the 
speech  signal  this  method  provides  an  estimate  of  the  background  noise 
for  use  in  noise  suppression.  The  system  approximates  the  spectral 
envelope  (the  vocal  tract  filter  response)  of  the  speech  wave.  The 
pitch  extraction  technique  forms  an  intergral  part  of  the  analysis 
system.  The  average  pitch  is  constantly  fed  back  into  the  system  so 
that  peaks  in  the  speech  wave  can  be  estimated.  The  smoothing  of 
these  peaks  form  the  estimated  envelope  to  be  quantized  for 
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transmission.  The  pitch  and  V/U V  information  is  also  transmitted. 

This  method,  as  in  the  previous  methods,  uses  a  windowed  frame  of 
speech  but  this  window  is  adaptive  in  length,  averaging  about  2.5 
times  the  pitch  period  in  length.  The  background  noise  estimator, 
when  used,  adds  to  the  vocoder  data  rate.  See  Appendix  A3. 

In  all  methods,  except  the  phase  vocoder,  pitch  and  voicing 
information  is  necessary.  There  are  very  many  different  methods 
for  extracting  this  information  [4,  16,  34,  42,  56,  78,  95,  99,  115]. 
Generally,  the  periodic  nature  of  voiced  sounds  is  detected  from  the 
voiced  signal.  If  no  periodic  or  very  slowly  varying  set  of  peaks  is 
detected,  the  unvoiced  signal  is  generated.  Only  one  bit  is  needed 
for  transmission  to  indicate  the  V/UV  decision.  The  detected  periodic 
signal  is  measured  to  determine  the  pitch  period.  Usually  about  seven 
bits  are  used  to  quantize  this  parameter. 

The  pitch  extraction— V/UV  decision  process  is  an  integral 
part  of  any  vocoder.  Therefore,  when  selecting  a  particular  vocoder 
no  choice  of  this  process  is  available.  The  designer  of  the  vocoder 
made  the  choice  as  to  which  pitch  extraction  algorithm  best  works  with 
or  is  most  economical  in  the  vocoder  designed.  Further  information  on 
pitch  extraction— V/UV  decisions  can  be  found  in  almost  every 
reference  listed  in  the  Bibliography. 


CHAPTER  III 


ASSESSMENT  METHODOLOGY 


Overview 

Vocoder  systems  are  implemented  in  a  variety  of  methods  as 
previously  mentioned  in  Chapter  II.  Additionally,  each  method  usually 
has  a  number  of  different  implementations.  This  diversity  of  systems 
establishes  the  need  for  a  method  to  quantitatively  compare  these 
systems  with  each  other.  This  chapter  presents  parameters,  the  param¬ 
eter  constraints,  and  the  minimum  specifications  used  for  the  vocoder 
evaluation.  A  Figure  of  Merit  analysis  method  is  developed  for  com¬ 
paring  the  vocoders.  Then  the  differences  in  evaluating  a  system 
for  near-term  and  far-term  implementations  are  discussed  with  the 
associated  modifications  in  the  Figure  of  Merit  analysis  presented. 
Finally,  a  discussion  of  each  parameter  and  the  choice  of  the  minimum 
specifications  is  presented. 

Parameters  and  Constraints 

Any  piece  of  equipment  can  be  compared  in  terms  of  perfor¬ 
mance,  size,  weight,  cost,  availability,  etc.  When  parameters  of  this 
type  are  considered,  minimum  acceptable  values  are  usually  assigned. 

In  determining  these  minimum  values,  several  factors  interact  to  pro¬ 
vide  constraints.  The  constraining  factors  in  the  selection  of  a 
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vocoder  to  be  used  on  board  an  aircraft  are:  (1)  mission  require¬ 
ments,  (2)  platform  requirements  (aircraft  physical  limitations),  and 
(3)  user  requirements.  Quite  often  state-of-the-art  technology  capa¬ 
bilities  require  that  some  parameter  constraints  be  relaxed.  Minimum 
values  assigned  a  parameter  can  be  a  result  of  one  or  more  of  the 
constraints  mentioned  above.  Table  3-1  lists  those  parameters  con¬ 
sidered  important  in  evaluating  a  vocoder  for  an  airborne  brassboard 
LPI  radio  system.  The  table  lists  the  parameters  in  order  of 
decreasing  importance.  Included  in  the  table  is  (are)  the 
constraining  element(s)  for  each  parameter  (identified  as  1,  2  or  3 
from  above)  and  the  associated  minimum  specification. 


Figure  of  Merit  Analysis 
Near-Term  Brassboard  System 

The  brassboard  system  conceptual  design  testing  will  be  ini¬ 
tiated  within  two  or  three  years  from  the  present.  Based  upon 
experience  (54,  120,  124,  126,  136)  with  hardware  development,  this 
means  the  hardware  must  be  available  now  either  in  production  or  as  an 
engineering  prototype  which  a  company  would  be  willing  to  sell  to  the 
Air  Force.  This  fact  alone  eliminates  about  80  percent  of  the  vocoder 
systems/methods  being  researched  by  various  organizations.  For  this 
phase  of  the  LPI  Comm  ADP  only  one  or  two  systems  will  need  to  be 
purchased  for  testing  purposes.  The  flight  tests  will  be  conducted  to 
determine  the  feasibility  of  implementing  an  LPI  communciations 
system.  Each  of  the  parameters/specifications  listed  in  Table  3-1  can 
take  on  a  range  of  values  given  in  the  Figure  of  Merit  Analysis  chart 
shown  in  Table  3-2.  These  values  are  listed  in  a  set  of  columns 


20 


TABLE  3-1 

SYSTEM  PARAMETERS  WITH  CONSTRAINTS  AND 
MINIMUM  SPECIFICATIONS  FOR 
NEAR-TERM  ANALYSIS 


Parameter 

Constraints* 

Specification 

Data  Rate  (bit  per  second 
or  b/s) 

1 

_<  2400 

Intelligibility 

1,  3 

>  80%  DRTt 

Input  Probability  of  Error 

1 

>_  10”3 

Size 

2 

_<  3000  in3  (1.76  ft3) 

Wei ght 

2 

<_  50  lb. 

Power  Consumption 

2,  3 

£  100  W 

Processing  Delay 

1,  3 

Real  time,(<  100  ms 
throughput) 

System  Avail abil  ity 

1,  3 

Engineering  Prototype 
(minimum) 

Production  Cost  (Lots  of  1,000) 

3 

£  $40, 000/unit 

Speaker  Dependence 

3 

None 

Vocabulary  Dependence 

3 

None 

System  "Learning"  Time 

1,  3 

None 

*  1.  Mission  requirements 

2.  Platform  requirements 

3.  User  requi rements 

*  Diagnostic  Rhyme  Test  Score 

(discussed  later) 

4**  «  .  1. 1.  •  .  -  .  .* 


TABLE  3-2 

FIGURE  OF  MERIT  ANALYSIS  SYSTEM  FOR  BRASSBOARD  INSERTION  SYSTEM  (Sample  Chart) 


OVERALL  SYSTEM  FIGURE  OF  MERIT  (F«).  TOTAL 


labeled  "Parameter  Figure  of  Merit."  Each  of  the  values  listed  are 
mapped  with  a  one-to-one  correspondence  to  the  Figure  of  Merit  ranging 
from  0  to  10  heading  each  column  resulting  in  an  assigned  figure  of 
merit,  Fp.  In  the  table,  each  parameter  is  mapped  to  a  weight,  Wp, 
according  to  the  relative  importance  of  the  parameter.  The  sum  of  the 
weights  is  normalized  to  1.0.  The  product  of  the  parameter  weight  and 
the  parameter  Figure  of  Merit  results  in  the  parameter  score,  P 

J  I 


P 

s 


(3-1) 


the  sum  of  the  parameter  scores  is  the  System  Figure  of  Merit,  F 


12 

Fs  »  l  Ps  ,  (3-2) 

s  i=l  si 

ranging  in  value  from  0.000  to  10.000.  Each  system  under  consideration 
is  evaluated  independently  on  a  table  identical  to  the  one  presented 
here.  After  system  evaluation,  the  systems  are  compared  to  each  other 
using  the  system  Figure  of  Merit  totals.  The  system  having  the  highest 
Figure  of  Merit  is,  theoretically,  the  optimum  system. 

In  the  mapping  of  parameter  values  to  Figures  of  Merit,  it  is 


not  necessary  that  a  linear  relationship  exist.  For  example,  the  data 
rate  mapping  of  b/s  to  the  Figure  of  Merit  is  shown  in  Table  3-3. 


TABLE  3-3 


DATA  RATE  MAPPING 

Parameter  Value 

< - > 

Figure  of  Merit 

<150 

10 

150 

9 

200 

8 

300 

7 

400 

6 

600 

5 

800 

4 

1,200 

3 

1,600 

2 

2,400 

1 

>2,400 

0 

Examination  of  Table  3-2  shows  that  this  type  of  nonlinearity  exists 
for  most  parameters.  Not  all  of  the  parameters  have  a  range  of  11 
values.  In  these  cases,  the  range  is  distributed  as  evenly  as 
possible  over  the  Figure  of  Merit  mapping  range.  This  is  best 
demonstrated  by  the  availability  mapping  for  systems  under  con¬ 
sideration  for  brassboard  insertion.  This  is  detailed  in  Table  3-4. 

In  each  system  evaluation  according  to  Table  3-2,  any 
parameter  scoring  a  0  parameter  Figure  of  Merit  is  marked  with  an 
asterisk  (*)  in  the  parameter  score  column.  Any  system  marked  in  this 
manner  has  fallen  below  a  minimum  specification  and  is  therefore 
deleted  from  further  consideration.  The  data  on  these  systems  is 
still  provided  in  the  event  that  future  considerations  dictate  a 
relaxing  of  any  minimum  specifications. 


24 


TABLE  3-4 


DATA  RATE  MAPPING 


Parameter  Value  < — 

■--> 

Figure  of  Merit 

Production 

10 

9 

8 

7 

6 

Engineering  Prototype 

5 

4 

3 

2 

1 

All  Others 

0 

Quite  often  in  evaluating  the  systems,  one  or  more  parameter 
values  may  not  be  available.  In  this  case,  two  methods  for  adjusting 
the  table  exist.  In  the  first  method,  the  parameter  value  can  be 
estimated  by  conversation  with  the  developing  engineers  or  by  com¬ 
parison  with  similar  systems.  If  estimates  are  not  possible,  the 
parameter  can  remain  unassigned  and  the  system  figure  of  merit  is 
renormalized,  F  ,  by  dividing  it  by  the  sum  of  the  parameter  weights 
which  are  assigned, 

F  -  Fs  •  (3-3) 

s  '  N 

M 

Pi 

In  either  case,  the  system  involved  is  flagged  so  that  the  Air  Force 
contractor  or  investigator  can  identify  it  and  attempt  to  get  more 
accurate  data  if  desired.  In  this  evaluation,  both  methods  will  be 


utilized  with  estimated  systems  flagged  with  a  capital  £  and  renor¬ 
malized  systems  flagged  with  a  capital  R_. 

Figure  of  Merit  Analysis, 

Far-Term  LASARS 

In  the  event  the  brassboard  flight  tests  indicate  that  LPI  com¬ 
munications  constitute  a  viable  concept  and  given  a  continuing  need 
with  the  associated  funds  appropriated,  the  Air  Force  will  proceed 
into  a  system  development  phase.  When  the  decision  to  continue  is 
made,  a  final  decision  as  to  which  component  subsystems  to  purchase 
and  incorporate  will  be  made.  This  implementation  is  projected  to 
occur  in  1993  or  1994.  At  this  time,  all  of  the  subsystems  must  be 
through  engineering  development  and  military  specification  testing  and 
be  ready  for  production  with  the  producing  company  already  setting  up 
their  production  line.  Based  on  the  group  experience  mentioned  pre¬ 
viously,  the  engineering  development  requires  three  to  five  years  with 
the  following  military  specification  testing  requiring  an  additional 
two  to  five  years.  In  the  worst  case,  this  means  ten  years  of  deve¬ 
lopment  and  testing  are  required  with  additional  time  required  to 
establish  production  capabilities.  In  order  to  meet  this  deadline, 
the  systems  to  be  considered  must  currently  exist  as  laboratory  models 
or  as  algorithm  simulations  designed  to  modify  existing  systems. 
Totally  new  techniques,  now  existing  only  as  computer  simulations, 
will  probably  require  twelve  to  fifteen  years  to  reach  the  stage 
required  by  the  Air  Force  (54). 

Current  experiments  with  modifications  in  quantizing  schemes 


and  new  insights  into  perceptual  differences  indicate  that  lower  data 
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rates  with  improved  performance  will  be  available.  Additional ly,  new 
systems  implemented  with  the  state-of-the-art  technology  in  Very  Large 
Scale  Integration  (VLSI)  and  in  Very  High  Speed  Integrated  Circuits 
(VHSIC)  will  be  smaller,  lighter,  and  require  less  power  than  current 
production  equipment.  This  leads  to  a  tightening  of  the  minimum  spe¬ 
cifications.  Added  to  these  reductions  are  a  tightening  of  the 
requirements  as  a  result  of  the  applications  of  the  system  being 
designed.  Table  3-5  shows  the  modified  minimum  specifications  and  a 
revised  order  of  importance  to  be  used  in  the  far-term  system  evalu¬ 
ation.  These  specifications  are  utilized  in  Table  3-6  for  comparing 
the  far  term  systems. 

Discussion  of  Parameters 

The  system  parameters  being  used  to  evaluate  and  compare  voco¬ 
ders  are  listed  in  Tables  3-1  and  3-5.  These  parameters  are  not  of 
equal  importance.  The  relative  importance  of  each  is  indicated  by  the 
parameter  weight  in  Tables  3-2  and  3-6.  The  order  of  importance 
derives  from  estimated  requirements  of  the  Air  Force  and  needs  of  the 
LPI  Comm  ADP.  These  parameters  consistute  as  complete  a  set  as  is 
possible  at  this  time.  Even  within  this  group,  quite  often  some  para¬ 
meters  are  not  available  on  a  system.  A  discussion  of  each  is  given 
be! ow. 

Data  Rate.  Speech  information  rate  is  the  most  important 
parameter  because  it  directly  affects  the  vocoder  gains  towards 
improved  LPI  capabilities  as  discussed  in  Chapter  II  and  in 
Appendix  A2.  Systems  are  being  researched  with  data  rates  ranging 
from  75  b/s  to  16000  b/s.  The  maximum  rate  of  2400  b/s  was 
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TABLE  3-5 


None 


ANALYSIS  SYSTEM  FOR  LASARS  IMPLEMENTATION  SYSTEM  (Sample  Chart) 
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established  because  the  Air  Force  has  already  approved  a  2400  b/s 
LPC  vocoder  for  a  different  implementation.  Air  Force,  Navy,  DoD,  and 
NATO  specifications  exist  at  this  time  describing  vocoder  output 
formats  conforming  to  a  specific  form  of  LPC  (LPC -10)  at  2400  b/s. 
These  specifications  do  not  constrain  advanced  development 
implementations. 

Intel  1 igibil ity.  Intelligibility  is  a  key  parameter  and  is 
nearly  as  important  as  data  rate.  Unlike  telephone  conversations  with 
nearly  unlimited  context,  aircraft  mission  communications  must  achieve 
a  very  high  transfer  rate  of  information  with  extremely  limited 
context  and  without  repetition.  As  the  mission  criticality  increases, 
the  contextual  support  available  decreases.  At  the  present  time,  as  a 
result  of  testing  convenience,  there  is  only  one  widely  used, 
quantitative  measure  of  vocoder  intelligibility.  This  is  the 
Diagnostic  Rhyme  Test  ( OR T )  developed  and  conducted  by  Dr.  William 
Voiers  of  Dynastat,  Inc.,  Austin,  Texas.  [139-146]  Voiers,  Air  Force 
personnel  and  others  (10,  68,  100,  119,  123,  135,  145)  have  determined 
that  systems  scoring  approximately  80  percent  on  the  DRT  provide 
reasonable  acceptability  of  the  intelligibility  of  the  transmitted 
speech  signal.  At  this  time,  new,  realistic,  conversational  tests  are 
being  developed  (113,  125)  which  could  produce  new  results  before  the 
implementation  of  the  LASARS.  All  intelligibility  scores  are  given 
for  acoustically  benign  environments. 

Input  Probability  of  Error.  In  any  communications  link, 
errors  occur  which  distort  the  incoming  data.  These  errors  are  the 


result  of  a  variety  of  sources  such  as  lightning,  sunspots,  atmos¬ 
pheric  temperature  fluctuations,  other  communication  transmissions, 
multipath  transmission,  etc.  Because  of  this,  communication  systems 
must  be  tolerant  to  a  certain  number  of  bit  errors.  The  number  of 
errors  the  system  can  tolerate  directly  affects  the  signal  power  level 
out  of  the  transmitter.  Because  an  LPI  communication  system  is  aimed 
at  operation  at  marginal  levels,  the  more  errors  the  system  can 
tolerate,  the  larger  the  signal  power  reductions  can  be,  making  the 
system  more  attractive  for  LPI  usage.  It  is  desired  that  the 
synthesizer  of  the  brassboard  insertion  system  require  for  operation 
an  input  probability  of  error,  P  ,  no  smaller  than  10“^. 

Physical  Parameters.  Size,  weight,  and  power  consumption  are 
highly  correlated  factors.  They  are  largely  technology  dependent. 
Higher  levels  of  circuit  integration  mean  reduced  size,  reduced 
weight,  and  reduced  power  consumption.  A  microprocessor  analyzer/ 
synthesizer  is  smaller,  lighter,  and  has  a  lower  power  consumption 
than  a  filter  bank  system.  The  brassboard  system  tested  in  the  late 
1980s  is  being  designed  to  test  the  overall  LPI  system  concepts.  It 
will  be  flown  in  the  cargo  bay  of  a  military  cargo  aircraft  on  pallet- 
mounted  racks.  At  this  time  optimum  size,  weight,  and  power  constric¬ 
tions  will  not  apply.  The  specifications  are  chosen  somewhat 
arbitrarily  to  insure  the  equipment  can  be  physically  handled,  reason¬ 
ably  easily.  It  is  believed  that  approximately  two  cubic  feet 
and  fifty  pounds  should  be  a  limit.  The  power  requirements  are  also 
not  too  restrictive.  The  test  system  will  have  an  independent  power 
source  available  to  provide  whatever  power  is  needed.  One  hundred 


watts  is  projected  to  be  the  maximum  power  necessary  for  a  flight 
testable  brassboard  system.  More  restrictive  requirements  apply  to 
the  LASARS  implementation  because  they  would  be  permanently  mounted  in 
much  smaller  aircraft. 

Data  Input/Output  Delay.  Processing  delay  is  important  in 
terms  two  of  aspects.  If  the  system  does  not  operate  in  real-time,  it 
is  unacceptable.  Real-time  systems  generate  a  continuous  output  given 
a  continuous  input  without  having  to  pause  to  perform  processing  and 
without  overloading  the  system  causing  a  loss  of  data.  The  second 
aspect  is  pipeline  delay.  This  is  the  length  of  time  needed  to  pro¬ 
vide  an  initial  output  for  an  initial  input.  In  two-way  conver¬ 
sations,  any  time  lag  greater  than  100  milliseconds  is  noticeable  and 
delays  of  250  milliseconds  or  more  make  conversations  hard  (almost 
impossible)  to  conduct  in  a  strategic  or  tactical  environment  where 
rapid  communications  are  necessary. 

Availability.  System  availability  occurs  in  two  phases,  near- 
and  far-term.  Near-term  systems  exist  as  engineering  prototypes  with 
production  models  available  in  three  to  five  years  or  as  systems 
currently  in  production.  Far-term  systems  include  the  near-term 
systems,  laboratory  models  available  for  production  in  six  to  ten 
years,  and  also  simulations  of  modifications  to  current  laboratory 
models,  engineering  prototypes,  or  production  systems.  The 
"far- term-only"  systems  are  not  applicable  to  the  brassboard  develop¬ 
ment  but  any  of  them  on  which  research  is  continued  are  appropriate 
considerations  for  the  LASARS  implementation  in  the  mid-1990s. 
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Cost.  Production  cost  is  the  specification  utilized  in  this 
analysis.  It  is  not  the  most  accurate  measure  of  system  cost.  Life 
cycle  cost  is  more  comprehensive.  It  includes  development  costs, 
production  costs,  cost  for  purchasing  and  stocking  spare  parts,  main¬ 
tenance  costs,  costs  for  training  maintenance  personnel,  and  replace¬ 
ment  costs— all  based  on  the  life  of  the  system  and  the  life  of 
the  host  equipment  (i.e.,  the  airframe).  This  information  is  not 
available  from  the  vendor  and  lack  of  personal  experience  forbids 
making  estimates. 

In  the  purchase  of  one  or  two  systems  for  brassboard  testing, 
cost  is  rel ati vely  unimportant.  More  can  be  paid  per  individual  item 
to  test  a  concept  than  to  implement  one.  The  managers  of  the  LRI 
program  have  determined  that  $40,000  is  not  too  much  to  pay  for  a  test 
system.  For  production  purposes,  the  vocoder  portion  of  the  LASARS 
should  cost  as  little  as  possible,  preferably  less  than  $1,000  each. 
The  production  cost  figures  utilized  are  estimates  only.  This  figure 
is  usually  established  through  bids  and  varies  with  quantity. 

Binary  Decision  Parameters.  Speaker  dependence,  vocabulary 
dependence  and  system  learning  time  constitute  binary  decision  param¬ 
eters.  Any  system  which  requires  a  specific  speaker  to  achieve  the 
required  intelligibility  or  to  be  voice  recognizable  by  the  listener 
is  unacceptable.  Vocabulary  dependent  systems,  usually  utilizing  some 
type  of  look-up  table,  are  unacceptable  because  of  the  wide  range  of 
applications  for  the  vocoder  and  the  wide  range  of  mission  require¬ 
ments  within  a  single  application.  These  parameters,  then,  establish 
a  "go/ no  go"  limit  on  the  systems  under  consideration. 


Several  systems  are  under  investigation  which  utilize  a  look¬ 
up  table  to  find  a  "best -match"  pattern  to  the  analyzed  segment  of 
speech.  These  systems  update  the  patterns  in  memory  utilizing  a 
"least  accessed,  first  replaced"  algorithm.  In  this  type  of  system, 
intelligibility  is  maintained  but  the  ability  to  recognize  the  speaker 
occurs  only  after  the  new  speaker  has  caused  enough  spectral  patterns 
to  be  replaced.  The  pattern  replacement  is  called  system  "training" 
or  "learning"  time.  Instantaneous  speaker  recognition  is  necessary  in 
short  duration,  high  volume  communication  environments  involving  many 
different  speakers.  Therefore,  the  "learning"  time  required  must 
provide  "almost  instantaneous"  updating  of  the  memory  patterns. 
Although  intelligibility  is  maintained,  a  very  slight  degradation 
occurs  initially  which  is  returned  to  normal  as  the  system  "learns." 
This  increases  the  requirements  for  rapid  updating  of  the  system.  The 
systems  should  require  no  training  time. 

Most,  if  not  all,  of  the  specifications  of  the  parameters 
discussed  will  be  tightened  when  final  consideration  is  made  to 
determine  the  1990's  LASARS  applicable  vocoder.  Not  much  improvement 
in  intelligibility  is  expected  although  more  robust  operation  is 
expected.  More  immunity  to  acoustic  noise  in  the  analysis  environment 
is  a  result  of  the  improvements  in  the  robustness.  Also  included  in 
this  is  more  naturalness  (less  mechanical  sounding)  in  the  synthesized 
speech.  The  new  parameter  specifications  are  those  given  in 


CHAPTER  IV 


PRESENTATION  OF  VOCODER  SYSTEMS 


Overview 

The  research  conducted  identified  more  than  thirty  vocoder 
systems  existing  as  production  equipment,  working  engineering  proto¬ 
type  models,  working  laboratory  models,  or  as  computer  software 
simulations.  This  chapter  presents  each  system  with  some  general, 
qualitative  information  including  developing  organization, 
analysis/ synthesis  method,  strengths  and  weaknesses.  A  table  with  the 
quantitative  system  parameter  values  is  provided  for  each  system. 

Vocoder  System  Presentation 

Table  4-1  gives  the  developing  organization,  system  nomencla¬ 
ture,  and  the  references  in  the  Bibliography  for  each  system.  As 
indicated  in  Chapter  III,  systems  which  exist  only  as  computer  soft¬ 
ware  simulations  are  inappropriate  to  be  considered  for  either  phase 
of  the  LPI  Comm  ADP  by  reason  of  nonavailability.  Table  4-2  lists 
these  nonavailable  systems.  Additional  information  on  these  systems 
can  be  found  in  the  references  cited  in  Table  4-1  and  will  not  be 
included  here.  Most  of  the  systems  listed  in  Table  4-2  do  not  run  in 
real-time  at  the  present  and  several  could  have  been  eliminated  for 
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TABLE  4-1 

VOCODER  SYSTEMS  IDENTIFIED 


Devel oper/Producer 

System/Nomencl  autre 

Reference 

ITT/Tri  Tac 

CV-3591  (ANDVT) 

20, 

39,  61,  63,  64 

129 

,  131 

USAF/ITT 

Frame  Predictive  LPC 

125 

,  149 

TI 

VIS-Speech  Processor 

72 

TI 

Time  Encoded  LPC  Roots 

97 

Motorol a 

ATMMRP 

18, 

30,  89 

Motorol a 

Manpack 

18, 

30,  89 

Motorol a 

MNSVS 

18, 

30,  89 

E -Systems 

CV-333A/U 

9. 

13,  22,  24,  147 

E -Systems 

CV-3333/U 

9, 

13,  25,  26,  147 

E -Systems 

CV-3670/A 

9, 

13,  23,  25,  27, 

147 

E-Systems 

LPC -24 

9, 

13,  28,  147 

GTE 

MRD-2000G 

51, 

52,  79,  148 

GTE 

UVD-2000 

51, 

52,  79,  148 

GTE 

CV-3832  (MRVT ) 

53, 

79,  148 

GTE 

TDHS 

73 

MIT  Lincoln  Labs  (LL) 

Compact  LPC 

29, 

57 

LL 

Adaptive  Subband  Format 

87 

Analysi s 

LL 

SEE 

99, 

100 

LL 

800  b/s  SEE 

98, 

100,  101 

LL 

Wideband  SEE 

100 

LL 

Frame  Fill  LPC 

10, 

11,  100,  101 

LL 

Channel 

48 

LL 

Pattern  Matching  Channel 

43, 

44 

LL 

Vector  Quanti zed  LPC 

10, 

100,  101 

LL 

Homomorphic  Prediction 

69 

Bolt,  Baranek,  & 

Newman  (BBN) 

Segment  Quantization 

108 

BBN 

Single  Frame 

Quanti zation 

108 

BBN 

HDV  LPC 

138 

BBN 

Variable  Order  Markov 

107 

Naval  Research  Lab 

Linear 

(NRL) 

Predictive  Formant 

67 

NRL 

Line  Spectrum  Pairs 

37 

NRL/TRW  Corp. 

Vector  Quanti zed  LPC 

36, 

37,  66 

Stanford  Research  Inst. 

RELP 

133 

,  134 

'  «*’ 
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TABLE  4-1— Continued 


Univ.  of  Notre  Dame 

RELP 

17 

Signal  Technology,  Inc. 
Korean  Advanced  Inst. 

Vector  Quanti zed 

LPC 

150 

of  Science 

Low-Rate  Digital 

Formant 

132 

OoD 

Differential  LPC 

38 

DoD/CNR,  Inc. 

LPC-10  Formant 

92 

TABLE  4-2 

NONAVAILABLE  SYSTEMS  OR  ALGORITHMS 


Devel oper/Producer 


System 


GTE 

TDHS 

Stanford  Research  Inst. 

RELP 

Univ.  of  Notre  Dame 

Korean  Advanced  Inst. 

RELP 

of  Science 

Low-Rate  Digital  Formant 

LL 

Pattern  Matching  Channel 

LL 

Adaptive  Subband  Formant 
Analysi s 

LL 

Wideband  SEE 

LL 

SEE 

LL 

800  b/s  SEE 

LL 

Vector  Quanti zed  LPC 

LL 

Homomorphic  Prediction 

TI 

Time  Encoded  LPC  Roots 

BBN 

Segment  Quantization 

BBN 

Single  FramE  Quantization 

BBN 

HDV  LPC 

BBN 

•Variable  Order  Markov 

NRL 

Li  hear  Predictive  Formant 

NRL 

Line  Spectrum  Pairs 

DoD 

Differential  LPC 

DoD/CNR,  Inc. 

LPC-10  Formant 

Signal  Technology,  Inc. 

Vector  Quantized  LPC 

other  reasons  such  as  data  rate  too  high,  single  speaker  restrictions 
or  speech  intelligibility  too  low. 


Twenty  one  systems  or  algorithms  have  been  eliminated  by  being 
nonavail  able  in  the  desired  time  frame.  This  leaves  seventeen  systems 
to  be  considered  in  either  the  near-term  or  far-term  application  eva¬ 
luations.  Table  4-3  lists  the  systems  appropriate  for  the  brassboard 
insertion  as  determined  by  the  availability  as  given  in  Chapter  III. 
Table  4-4  lists  the  systems  which,  in  addition  to  those  in  Table  4-3, 
are  appropriate  for  inclusion  in  the  production  LPI  Comm  radio  system. 

TABLE  4-3 

SYSTEMS  CONSIDERED  FOR  NEAR-TERM  BRASSBOARD 


Developer/Producer  System 


ITT 

CV-3591  (ANDVT ) 

Motorol a 

ATMMRP 

Motorol a 

Manpack  (83-2791) 

Motorol a 

Miniturized  NSV  System 

E-Systems 

CV-3333  A/L 

E-Systems 

CV-3333/U 

E-Systems 

CV-3670/A 

E -Systems 

LPC-24 

GTE 

MRD-2000G 

GTE 

UVD-2000 

GTE 

CV-3832  (MRVT ) 

NR L /TRW  Corp. 

Vector  Quantized  LPC 

Vocoder  System  Descriptions- 

-Brasboard  Applicable 

In  this  section  the  vocoder  systems  listed  in  Table  4-3  are 
described.  These  are  the  systems  applicable  for  the  late  1980s  brass- 
board  insertion.  The  descriptions  consist  of  a  paragraph  giving  the 
developer/producer,  engineering  status,  information  about  the 
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analysis/ synthesis  methods  used,  and  any  appropriate  qualitative  com¬ 
ments  for  each  system.  This  is  followed  by  a  table  listing  all  of  the 
quantitative  data  available  for  each  system. 


TABLE  4-4 

ADDITIONAL  SYSTEMS  FOR  FAR-TERM  PRODUCTION  CONSIDERATIONS 


Devel oper/Producer 

System 

USAF/ITT 

Frame  Predictive  LPC 

LL 

Compact  LPC 

LL 

Frame  Fill  LPC 

LL 

Channel 

TI 

VIS-Speech  Processor 

ITT— CV  3591  (ANDVT)  Vocoder.  The  Advanced  Narrowband  Digital 
Voice  Terminal  (ANDVT)  is  a  DoD  approved,  2400  bit/ second,  LPC 
vocoder.  This  government-wide  system  is  the  result  of  extensive 
research  by  ITT  with  a  tri -service  research  committee.  It  is  sche¬ 
duled  to  go  into  production  within  a  couple  of  months  with  all  service 
branches  purchasing  units.  The  system  contains  an  adaptive  acoustical 
noise  cancellation  algorithm  for  improving  speech  intelligibility  in 
tactical  environments.  It  has  tw^  transmission  modes,  HF  and  Line-of- 
Sight.  This  unit  is  to  be  utilized  in  all  Air  Force  communication 
systems  now  existing  which  require  low  data  rates  for  secure  voice 
applications.  The  format  of  this  system  has  even  been  accepted  as  a 
North  Atlantic  Treaty  Organization  (NATO)  standard.  The  coding  con¬ 
sists  of  a  mixture  of  pitch  and  amplitude  semi-logarithmic  coef¬ 
ficients,  log-area-ratio  and  linear  coefficients  for  the  filter 


transfer  function  specifications,  and  error  detection  and  correction 
bits.  The  system  also  has  the  capability  to  transmit  nonvoice  data  at 


300,  600,  1200,  and  2400 

b/s.  Table  4-5  lists  the  quantitative  para- 

meter  values. 

TABLE  4-5 

ITT  CV-3591 

Parameter 

Val  ue 

Vocoder  Method 

LPC -10 

Data  Rate 

2400 

Intelligibility 

Input  P 

Si  ze  e 

Approximately  83% 

10  3 

750  inJ 

Weight 

20  lb. 

Power 

45  W 

Processing  Delay 

50-60  mS 

Avail abil ity 

Production 

Production  Cost 

$20,000 

Speaker  Dependence 

None 

Vocabulary  Dependence 

None 

System  "Learning"  Time 

None 

Motorol a — ATMMRP 

Vocoder.  The  Advanced  Technology  Model 

Multi  Rate  Processor  (ATMMRP)  LPC  Vocoder  is  a  small,  low  power, 

2400  b/s,  full  duplex,  LPC  vocoder.  It  is  compatible  with  the 
DoD  ANDVT.  It  employs  an  MC68000  microcomputer  and  CMOS  circuits. 

It  utilizes  partial  correlation  (PARCOR)  analysis  to  determine  the 
filter  transfer  function  specifications.  An  additional  output  is  pro¬ 
vided  for  Residual  Excited  LPC  (RELP)  coding  (9600  b/s).  The  system 
consists  of  several  microprogrammed  digital  signal  processing  ICs. 

It  currently  exists  at  Motorola,  Inc.,  Scottsdale,  Arizona,  as  an 


operational  engineering  prototype  model.  Table  4-6  lists  the  quan¬ 
titative  values  used  in  the  comparison. 


TABLE  4-6 
MOTORLA  ATMMRP 


Parameter 

Val  ue 

Vocoder  Method 

LPC 

Data  Rate 

2400 

Intel  1 igibil ity 

Approximately  89% 

Input  P 

bxl0“3„ 

Si  ze 

440  in'3 

Weight 

9.1  lb. 

Power 

4.2  W 

Processing  Delay 

60-60 

Avai  1  ability 

Working  Model 

Production  Cost 

$6000 

Speaker  Dependence 

None 

Vocabulary  Dependence 

None 

System  "Learning"  Time 

None 

Motorol a— Manpack  (83-2791) 

Vocoder.  The  Manpack  is  an  exten- 

sion  of  the  ATMMRP  chip  set  with  further  reductions  in  size.  It 
implements  parallel  processing  techniques  in  three  signal  processing 
chips  to  accomplish  the  size  and  power  reductions.  The  RELP  output  is 
deleted  but  compatibility  with  the  ANDVT  is  maintained.  The  V/UV 
decision  and  pitch  tracking  algorithms  have  been  improved  to  optimize 
operation  performance  for  noisy,  rapid  communications.  A  high  perfor¬ 
mance  automatic  gain  control  has  been  designed  and  included  especially 
for  the  rapid  communication  environment.  The  system  exists  as  a 
working,  engineering  prototype  model.  Table  4-7  lists  the  quan¬ 
titative  parameter  values. 


TABLE  4-7 
MOTOROLA  MANPACK 


Parameter 


Vocoder  Method 
Data  Rate 
Intel  1 igibil ity 
Input  P 
Si  ze  e 
Wei ght 
Power 

Processing  Delay 
Avai 1  ability 
Production  Cost 
Speaker  Dependence 
Vocabul ary  Dependence 
System  "Learning"  Time 


LPC-10 

2400 

Approximately  89% 
5  x  10-3 
90  inJ 
4.1  lb. 

2  W 

50-60  mS 

Working  Model 

$7000 

None 

None 

None 


Motorol a— MNSVS  Vocoder.  The  Miniaturized  Narrowband  Secure 
Voice  System  (MNSVS)  is  a  further  reduction  in  size  over  the  Manpack. 
It  utilizes  flatpack  and  leaded  chip  carrier  technology  over  dual 
inline  packages  (DIP)  to  accomplish  these  reductions.  Included  in  the 
system  is  a  dedicated  microprocessor  to  provide  a  variety  of  security 
levels.  This  system  also  exists  as  a  working  model  at  Motorola,  Inc., 
Scottsdale,  Arizona.  Table  4-8  lists  the  quantitative  values.  The 
Motorola,  Inc.,  Communications  Division  Product  Information  Report 
[89]  contains  photographs  of  all  three  of  Motorola's  vocoders  pre¬ 
sented  here. 


E-Systems— CV-3333/U.  The  CV-3333/U  Mil  Spec  Digital 
Speech  Processor  is  a  full/half  duplex,  2400  b/s,  channel  vocoder 
being  produced  for  the  U.S.  Navy.  Its  output  can  be  multiplexed  with 


other  data  streams  to  allow  simultaneous  voice  and  data  transmission. 


The  system  can  be  operated  from  a  standard  telephone  input.  It  is 
compatible  with  the  HY-2  channel  vocoder  which  it  is  replacing.  (The 
HY-2  is  constructed  with  discrete  components  (119)).  Table  4.9  gives 
the  parameter  values  used  in  the  comparisons. 


TABLE  4-8 
MOTOROLA  MNSVS 


Parameter 

Val  ue 

Vocoder  Method 

LPC 

Data  Rate 

2400 

Intell igibil ity 

Approximately  89% 

Input  P 

10-2 

Size  e 

20  in3 

Wei ght 

1.5  lb. 

Power 

2  W 

Processing  Delay 

50-60  mS 

Avail abil ity 

Working  Model 

Production  Cost 

$8000 

Speaker  Dependence 

None 

Vocabulary  Dependence 

None 

System  "Learning"  Time 

None 

E -Systems — CV-3333  A/U.  The  CV -3333  A/U  Audio-Digital 
Converter  is  a  full/half  duplex,  2400  b/s,  LPC  vocoder.  The  system 
is  in  production  at  E-Systems  Garland  Division.  It  is  compatible 
with  the  ANDVT.  Also  provided  is  compatibility  with  the  HY-2 
channel  vocoder.  The  LPC  filter  parameters  are  specified  through  the 
use  of  reflection  coefficients  with  standard  V/UV  decision  and  pitch 
tracking  algorithms  employed.  Like  the  CV-3333/u,  it  can  be 


multiplexed  with  other  data  streams.  Both  systems  contain  self-test 
subroutines  for  quick  fault  isolation  and  repair.  Repair  is 
accomplished  by  board  replacement.  Table  4-10  gives  the  quantitative 
val  ues. 

TABLE  4-9 

E-SYSTEMS  CV-3333/U 


Parameter  Value 


Vocoder  Method 

Channel 

Data  Rate 

2400 

Intell igibil ity 

Input  Pe 

Si  ze 

Approximately 
2  x  10-3 

2800  in'3 

88% 

Weight 

55  lb. 

Power 

200  W 

Processing  Delay 

Approximately 

75  mS 

Avail abil ity 

Production 

Production  Cost 

Approximately 

$25,000 

Speaker  Dependence 

None 

Vocabulary  Dependence 

None 

System  "Learning"  Time 

None 

E-Systems— CV-3670/A.  The  CV-3670/A  Airborne  Digital 
Speech  Processor  is  a  2400  b/s,  LPC  vocoder/ secure  voice  system 
smaller  than  the  CV-3333  A/U  (half  the  parts  count).  It  has  improved 
intelligibility  and  quality.  The  CV-3670/A  is  currently  in  use  aboard 
the  AWACS  and  other  USAF  aircraft.  The  system  is  ANDVT  compatible. 

The  speech  signal  analysis  is  performed  using  an  E-Systems  proprietary 
algorithm  utilizing  the  reflection  coefficients.  It  is  compatible 
with  standard  security  equipment  and  has  been  designed  for  minimum 
electromagnetic  emissions.  It  interconnects  for  I/O  with  the  aircraft 
intercom  system.  The  basic  unit  can  be  mounted  almost  anywhere 


44 


1 

■ 

Q 


TABLE  4-10 


E-SYSTEMS  CV-3333  A/U 


Parameter 

Val  ue 

Vocoder  Method 

LPC 

Data  Rate 

2400 

Intelligibility 

Approximately  90% 

Input  P 

2  x  10-3 

Si  ze 

2800  in3 

Wei ght 

45  lb. 

Power 

100  W 

Processing  Delay 

Approximately  75  mS 

Availability 

Production 

Production  Cost 

Approximately  $22,500 

Speaker  Dependence 

None 

Vocabulary  Dependence 

None 

System  “Learning"  Time 

None 

through  use  of  a  remote  control  unit  provided.  As  indicated,  this 
unit  exists  in  the  Air  Force  inventory.  Table  4-11  gives  the  quan¬ 
titative  data.  The  CV-3333,  CV-3333  A/U,  CV-3670/A,  all  interface 
with  MIL  STD-188  cryptography  units. 

E-Systems— Model  LPC-24.  The  Model  LPC-24  Digital  Speech 
Processor  is  a  2400  b/s  LPC  commercial  vocoder  which  has  been 
sold  in  quantity  internationally.  It  has  high  speech  quality.  It 
is  designed  for  operation  on  a  dedicated  network.  Optional  equipment 
is  available  to  allow  the  LPC-24  to  be  operated  as  a  dedicated  single 
use  terminal  and  to  expand  the  LPC-24  capability  to  include  a 
teleprinter  channel.  Table  4-12  lists  the  system  quantitative  param¬ 


eter  values 


TABLE  4-11 


E -SYSTEMS  CV-3670/A 
(remote  unit/ remote  control) 


Parameter 


Val  ue 


Vocoder  Method 

LPC 

Data  Rate 

2400 

Intel  1 igibil ity 

Approximately  90% 

Input  PQ 

2  x  io:3 

Size  e 

728  in3  /  78  in3 

Wei  ght 

20  lb.  /  1  lb. 

Power 

90  W  /  7  W 

Processing  Delay 

Approximately  75  mS 

Availability 

Production 

Production  Cost 

Approximately  $50,000 

Speaker  Dependence 

None 

Vocabulary  Dependence 

None 

System  "Learning"  Time 

None 

TABLE  4-12 
E -SYSTEMS  LPC-24 


Parameter 


Val  ue 


Vocoder  Method 

Data  Rate 

Intell igibil ity 

Input  P 

Si  ze 

Weight 

Power 

Processing  Oelay 
Avail abil ity 
Production  Cost 
Speaker  Dependence 
Vocabulary  Dependence 
System  "Learning"  Time 


LPC 

2400 

Approximately  90% 

2  x  10:3 
390  in3 
20  lb. 

100  W 

Approximately  75  mS 
Producti on 

Approximately  $10,000 

None 

None 

None 


GTE  Systems— MR0-2000G.  The  MRD-2Q00G  Voice  Digitizers  is  a 
2400  b/s,  LPC  vocoder  designed  for  military,  narrowband  secure  voice 
systems.  It  utilizes  a  GTE  proprietary  impl ementation  of  LPC 
analysis/ synthesis  designated  LPC  10/42.  The  LPC  output  is  ANDVT  com¬ 
patible.  The  system  can  be  supplied  with  optional,  switch  selectable 
voice  processing  algorithms.  These  are  adaptive  predictive  coding 
(APC)  and  sub-band  coding  (SBC)  at  7200  b/s  and  9600  b/s  respectively. 
A  second  option,  which  is  added  to  the  basic  LPC  to  make  the  system 
more  compatible  with  military  systems,  is  a  set  of  channel  vocoder 
outputs.  These  outputs  are  compatible  with  E -System's  CV-3333  voco¬ 
ders,  the  HY-2,  or  Great  Britian's  Belgarde  channel  vocoder  (another 
discrete  component  system).  The  system  contains  echo  suppression  cir¬ 
cuitry  with  60  dB  of  echo  suppression  and  self- test  circuitry  for  fast 
fault  isolation.  Military  Standard  MIL  STD-188C  provides  digital 
interface  to  external  encryption  devices  and  other  data  communications 
equipment.  I/O  is  through  a  handset  telephone  receiver.  Table  4-13 
lists  the  quantitative  parameter  values. 

GTE  Systems— U VO -2000.  The  UVD-2000  Voice  Digitizer  is  a  com¬ 
mercial  version  of  the  MRD-2000G.  It  utilizes  a  GTE  proprietary 
LPC-10  algorithm  (10  pole  model  as  in  the  ANDVT)  for  the 
analysi s/synthesi s  system.  All  of  the  options  and  included  features 
of  the  MRD-2000G  are  also  available.  A  standard  RS-232C  interface 
provides  interfacing  for  the  A/D  and  D/A  circuits.  Table  4-14  gives 
the  quantitative  values. 


TABLE  4-13 


GTE  SYSTEMS  MRD  2000G 


Parameter 


Val  ue 


Vocoder  Method 

LPC 

Data  Rate 

2400 

Intelligibility 

Approximately 

90% 

Input  P 

lO"3 

Size  e 

1666  in3  . 

Wei ght 

30  lb. 

Power 

70  W 

Processing  Delay 

50-60  mS 

Avail abil ity 

Production 

Production  Cost 

Approximately 

$15,000 

Speaker  Dependence 

None 

Vocabulary  Dependence 

None 

System  "Learning"  Time 

None 

TABLE  4-14 
GTE  SYSTEMS  UVD-2000 


Parameter  Val  ue 


Vocoder  Method 

LPC 

Data  Rate 

2400 

Intelligibility 

Approximately 

Input  P 

lO"3 

Size 

1020  in3 

Wei ght 

20  lb. 

Power 

60  W 

Processing  Delay 

50-60  mS 

Avai 1  ability 

Production 

Production  Cost 

Approximately 

Speaker  Dependence 

None 

Vocabulary  Dependence 

None 

System  "Learning"  Time 

None 

GTE  Systems— CV-3832  MRVT.  The  GTE  Systems  Multiple  Rate 
Voice  Terminal  (MRVT)  is  an  LPC-based  multiple  rate  vocoder.  It  pro¬ 
vides  simultaneous  outputs  of  16000,  9600,  and  2400  b/s.  Each  out¬ 
put  is  coded  independently.  The  2400  b/s  output  forms  the  basis  with 
data  bits  added  to  the  stream  to  generate  the  higher  data  rates.  This 
forms  an  imbedded  data  scheme.  This  modem  can  converse  with  vocoders 
at  different  data  rates  simultaneously.  It  also  can  provide  interface 
between  two  systems  at  different  rates  with  the  quality  limited  to 
that  of  the  system  with  the  lowest  data  rate.  The  9600  b/s  output  is 
of  the  RELP  formant  (prediction  residual  bits  are  utilized,  see 
Table  4-1  for  more  information  and  references).  The  16000  b/s  stream 
is  approximately  telephone  toll  quality.  Built  in  test  capabilities 
are  also  provided.  The  2400  b/s  data  stream  is  ANDVT  compatible. 

Table  4-15  lists  the  system  parameter  values. 


NRL/TRW  Corp.— Vector  Quantized  LCP  Vocoder.  The  Naval 
Research  Laboratory  (NRL)  is  currently  having  TRW  build  a  Vector 
Quantized  LPC  Low  Data  Rate  Voice  Terminal  (LDRVT)  engineering  proto¬ 
type.  The  system  has  two  switch  selectable  outputs,  standard  2400  b/s 
LPC  and  vector  quantized  LPC  at  800  b/s.  Vector  quantizing  occurs  by 
matching  the  reflection  coefficients  of  each  data  frame  to  a  set  of 
stored  patterns  in  memory  and  then  transmitting  the  pattern  index 
instead  of  the  coefficients.  The  2400  b/s  stream  is  ANDVT  compatible. 
This  system  provides  the  ability  to  provide  an  interface  between 
systems  at  each  data  rate  with  quality  limited  to  that  of  the  lower 
rate  system.  Some  quality  and  intelligibility  degradation  occurs 
because  any  reasonably  sized  set  of  patterns  cannot  completely  model 


TABLE  4-15 


GTE  SYSTEMS  MRVT 


Parameter 

Val  ue 

Vocoder  Method 

LPC 

Data  Rate 

2400/9600/16000 

Intell igibil ity 

Approximately  90%  at  2400  b/s 

Input  P 

10-3 

Size  e 

3242  in3 

Weight 

55  lb. 

Power 

200  W 

Processing  Delay 

Approximately  60  mS 

Avai  1  abil  ity 

Production 

Production  Cost 

Approximately  $25,000 

Speaker  Dependence 

None 

Vocabulary  Dependence 

None 

System  "Learning"  Time 

None 

the  full  range  of  speech  production.  Table  4-16  gives  the  quan¬ 
titative  values  used  for  comparison  purposes. 

Vocoder  System  Descriptions— LASARS  Applicable 

The  preceding  paragraphs  described  those  systems  which  exist 
as  some  sort  of  functioning  hardware.  These  systems  along  with  those 
listed  in  Table  4-4  are  considered  to  be  a  reasonably  complete  list  of 
systems  which  should  be  through  all  testing  stages  and  be  production 
impl ementabl e  by  approximately  1992  or  1993  for  inclusion  in  the 
LASARS.  This  section  describes  the  systems  listed  in  Table  4-4. 


USAF/ ITT —Frame  Predictive  LPC  Vocoder.  Through  an  Air  Force 
monitored  contract  with  ITT,  a  400  b/s  vocoder  is  being  designed. 

This  vocoder  implements  2400  b/s  LPC  which  is  then  vector  quantized  to 


TABLE  4-16 


NRL/TRW  VECTOR  QUANTIZED  LPC 


Parameter 

Val  ue 

Vocoder  Method 

LPC 

Data  Rate 

800 

Intel  1 igibil ity 

84% 

Input  P 

(not  tested) 

Si  ze 

2400  in3 

Wei ght 

30  lb. 

Power 

90  W 

Processing  Delay 

250  mS 

Avail abil ity 

Engineering  Prototype 

Production  Cost 

Not  Available 

Speaker  Depend. 

None 

Vocabulary  Depend. 

None 

System  "Learning"  Time 

None 

800  b/s.  The  800  b/s  data  stream  is  then  frame  predicted  with  the 
pitch,  V/UV,  and  gain  parameters  coded  through  fake  process  trellis 
coding  utilizing  variable  rate  coding  to  further  reduce  the  data  rate 
to  under  400  b/s  (149).  Frame  prediction  techniques  are  employed  to 
remove  the  frame- to- frame  redundancy  in  the  LPC  filter  parameters 
through  the  use  of  frame  repeat  coding.  Given  a  frame  of  data 
transmitted,  if  the  following  frame  is  "close  enough"  using  some 
distortion  measure  as  in  Itakura  [50]  or  Wong  [149]  it  is  not 
transmitted.  Instead,  a  one-bit/frame  repetition  flag  (repeat/not 
repeat)  is  transmitted.  The  fake  process  trellis  coding  of  the  exci¬ 
tation  parameters  is  performed  independently  of  the  vector  quan¬ 
tization  and  the  frame  prediction.  A  search  algorithm,  e.g.,  Viterbi 
or  ML  (149),  is  employed  to  determine  the  code  to  minimize  the 


expected  distance  between  the  input  excitation  parameters  and  the 
encoded  output. 

This  frame  predictive  LPC  method  currently  exists  as  a  soft¬ 
ware  algorithm  to  modify  an  LPC  input  and  is  hosted  on  a  VAX  com¬ 
puter.  All  of  the  additional  processing  is  impl ementabl e  with 
programmable,  signal  processing  chips  which  could  be  added  to  ITT's 
existing  ANDVT  design  with  little  re-engineering  required.  If  given 
the  "go-ahead"  (funds)  ITT  states  they  could  have  a  working  model  in 
less  than  a  year  (125).  Table  4-17  lists  the  parameter 
specifications. 

TABLE  4-17 

USAF/ITT  FRAME  PREDICTIVE 


Parameter 


Vocoder  Method 

LPC 

Data  Rate 

400 

Intell igibil ity 

78.9% 

Input  V 

10-2 

Size  e 

(Unknown 

Wei ght 

Mai nframe 

Power 

Simul ation) 

Processing  Delay 

Approximately  300  mS 

Avail abil ity 

Simul  ation 

Production  Cost 

Not  Available 

Speaker  Dependence 

None 

Vocabulary  Dependence 

None 

System  "Learning"  Time 

None 

LL— Compact  LPC  Vocoder.  MIT  s  LL  located  at  Hanscom  AFB, 


Massachusetts,  is  performing  extensive  research  into  vocoder  design 
and  algorithm  improvement.  The  Compact  LPC  vocoder  is  a  single  card, 
laboratory  model  system.  It  is  small,  low  power,  and  relatively 


inexpensive.  It  is  a  2400  b/s  system  utilizing  only  commercially 
available  devices.  An  autocorrelation  analysis  is  performed  to 
generate  the  reflection  coefficients.  An  Intel  8085  is  utilized  to 
control  and  supervise  the  functions  within  the  LPC  analyzer,  synthe¬ 
sizer,  and  Gold  pitch  detector  (41).  The  system  is  designed  for  use 
with  a  compact  packet  voice  terminal.  The  system  parameters  are 
listed  in  Table  4-18. 


TABLE  4-18 

LL  COMPACT  LPC  VOCODER 


Parameter 

Value 

Vocoder  Method 

LPC 

Data  Rate 

2400 

Intelligibility 

89% 

Input  Pa 

Not  tested 

Size  e 

18  in3  (50-100  in3  with 

packaging 

Wei ght 

Approximately  .75  lb. 

Power 

5.5  W 

Processing  Delay 

Approximately  90  m3 

Avail abil ity 

Laboratory  Model 

Production  Cost 

Approximately  $1000 

Speaker  Dependence 

None 

Vocabul ary  Dependence 

None 

System  "Learning"  Time 

None 

LL— Channel  Vocoder.  Most  channel  vocoder  research  in  the 
U.S.  was  dropped  with  the  advent  of  LPC  analysis/synthesis  techniques. 
Since  about  1980  there  has  been  an  increase  in  interest  in  the  channel 
vocoding  method.  B.  Gold  (44)  at  LL  has  more-or-less  led  the  way  in 
this  renewed  interest.  This  LSI  design  vocoder  system,  utilizing 
charge  coupled  devices  (CCDs)  ,  is  a  switch  selectable  multi -rate 


system  with  data  rates  of  1200,  2400,  3600  and  4800  b/s.  Quality  and 
intelligibility  improvements  occur  with  each  increase  in  data  rate. 
Control  and  coordination  is  provided  through  the  use  of  an  Intel 
8085A-2  with  an  additional  8085A-2  performing  the  Gold  pitch  extrac¬ 
tion  and  V/UV  decision  making.  The  input  spectrum  is  divided  into  19 
channels  for  analysis.  The  coefficients  generated  specify  the  filter 
response  of  the  receiving  synthesizer.  Input  is  from  a  standard 
telephone  handset  This  system  is  small,  light,  and  power  efficient. 

LL  presently  has  a  working  laboratory  model.  Quantitative  values  are 
listed  in  Table  4-19. 

TABLE  4-19 
LL  CHANNEL  VOCODER 


Parameter 

Val  ue 

Vocoder  Method 

Channel 

Data  Rate 

1.2/2. 4/3. 6/4. 8  kb/s 

Intelligibility 

Input  P. 

Size  e 

Not  Available 

Not  Available 

215  in3 

Wei ght 

7  lb. 

Power 

5.3  W 

Processing  Delay 

Not  Available 

Availability 

Laboratory  Model 

Production  Cost 

Not  Available 

Speaker  Dependence 

None 

Vocabul ary  Dependence 

None 

System  "Learning"  Time 

None 

LL— Frame  Fill  LPC  Vocodoer. 

This  LL  vocoder  is  another 

laboratory  model  system.  It  starts  with  a  basic  2400  b/s  vocoder  and 


produces  a  1200  b/s  and  a  2400  b/s  switchable  output.  The  frame  fill 
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technique  is  accomplished  by  deleting  every  other  analysis  frame  from 
transmission.  Two  additional  bits  must  be  included  to  specify  to  the 
synthesizer  the  index  of  the  method  to  be  used  to  determine  how  to 
fill  in  for  the  missing  frame.  Three  possibilities  exist  for  regen¬ 
erating  the  missing  frame,  either  adjacent  frame  can  be  used  or  some 
weighted  combination  of  the  two  transmitted  frames  can  be  used.  With 
an  input  probability  of  error  less  than  10"^  the  resynthesized  speech 
is  virtually  indistinguishable  from  the  original  2400  b/s  LPC  speech. 
The  quantitative  data  available  is  listed  in  Table  4-20.  Reports  on 
this  system  are  as  yet  unpublished. 


TABLE  4-20 
LL  FRAME  FILL  LPC 


Parameter 

Val  ue 

Vocoder  Method 

Frame  Fill  LPC 

Data  Rate 

1200/2400 

Intel  1 igibil ity 

84%. 

Input  P 

IQ-2 

Size  e 

700  in3 

Wei ght 

22  lb. 

Power 

70  W 

Processing  Delay 

250  mS 

Avai 1  ability 

Laboratory  Model 

Production  Cost 

Not  Available 

Speaker  Dependence 

None 

Vocabulary  Dependence 

None 

System  "Learning"  Time 

None 

TI— VIS-Speech  Processor  Board.  Texas  Instruments  (TI)  has 
performed  extensive  speech  processing  research.  Their  efforts 
resulted  in  some  of  the  earliest  speech  synthesis  systems,  most 


notably  the  "Speak  and  Spell"  educational  toy  line.  The  Voice 
Interactive  Set  (VIS)  Speech  Processor  Board  incorporates  several 
functions.  It  performs  LPC  analysis  to  provide  limited  speaker- 
dependent  speech  recognition,  limited  speaker  verification,  and 
vocoding.  The  board  is  designed  for  interfacing  with  a  computer 
system  but  I/O,  A/D  conversion,  and  D/A  conversion  can  be  provided  by 
the  inclusion  of  a  Codex  with  interconnections  to  almost  any  analog 
input/output  device  (telephone  handset,  intercom  microphone/ speaker , 
etc.).  The  output  format  of  the  reflection  coefficients  is  ANDVT  com¬ 
patible.  Good  quality  results  have  been  obtained  (72)  with  the 
vocoder  operating  with  an  acoustic  background  of  104  dB  (helicopter 
environment)  up  to  116  dB  (other  aircraft  environments).  This  board 
makes  use  of  advanced  device  packaging  technology  to  achieve  an  extre¬ 
mely  small,  2400  b/s  system.  All  three  functions  exist  in  about  15 
square  inches  of  circuit  board  space.  This  vocoder  is  currently 
available  only  as  an  engineering  prototype  system.  The  quantitative 
values  are  listed  in  Table  4-21. 

In  the  preceding  descriptions  ANDVT  LPC  vocoder  and  HY-2  chan¬ 
nel  vocoder  compatibility  has  been  stressed,  where  applicable,  because 
the  HY-2  or  modifications  of  it  (KY-537,  an  SSI  implementation)  are 
currently  in  use  by  the  AF  and  because  the  ANDVT  is  being  purchased 
for  large  scale  deployment  within  the  Air  Force  in  the  immediate 
future.  Compatibility  with  either  of  these  systems  is  not  a  require¬ 
ment  for  the  brassboard  test  system.  At  the  present  time,  it  is  not  a 
requirement  for  the  LASARS  because  it  has  not  been  considered  (54). 
This  could  be  changed  when  final  specifications  are  established  for 
the  vocoder  to  be  inserted  into  the  LASARS.  Conversations  with 
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several  engineers  at  LL  and  the  Air  Force  Rome  Air  Development  Center 
(RADC)  pointed  out  that  400  and  800  b/s  LPC  vocoders  whose  2400  b/s 
basic  structure  is  ANDVT  compatible  could  also  be  ANDVT  compatible 
with  reduced  intelligibility  and  quality. 

TABLE  4-21 

TI  VIS-SPEECH  PROCESSOR  PARAMTER  VALUES 


Parameter 


Value 


Vocoder  Method 

LPC 

Data  Rate 

2400 

Intelligibility 

88. 4* 

Input  Pa 

10“5 

Size  e 

15  in3  (50-100  in3  with 
packaging) 

Wei ght 

6  oz  (0.375  lb.) 

Power 

15W 

Processing  Delay 

50-60  mS 

Avail abil ity 

Engineering  Prototype 

Production  Cost 

$3,500 

Speaker  Dependence 

None 

Vocabulary  Dependence 

None 

System  "Learning"  Time 

None 

The  systems  presented  are  far  from  the  final  word  in  vocoder 
technology.  At  this  time,  totally  different  concepts  are  being  viewed 
to  provide  new  methods  of  performing  speech  analysis.  All  of  the 
current  schemes  are  based  on  the  human  speech  production  system.  Fla¬ 
nagan  [32]  and  Gold  [47]  have  proposed  vocoder  analysis  methods  based 
upon  the  properties  of  the  human  auditory  system.  This  is  thought  to 
be  a  viable  approach  because  of  "the  fact  that  the  human  peripheral 
auditory  system  is  a  superior  signal  processor  to  that  of  the 
vocoder"  (47).  This  research  is  concentrating  on  duplicating  the 
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functions  of  the  auditory  system  in  electrical  hardware.  This  inclu¬ 
des  the  functions  of  the  outer  ear,  the  inner  ear  (the  hammer,  anvil, 
and  stirrup),  the  coder,  the  cilia  which  form  the  chain  which  trans¬ 
forms  the  signal  from  sound  waves  to  electrical  impulses.  The  Lincoln 
Laboratories  report  by  Gold  and  Tierney  [47]  is  an  extensive 
discussion  of  this  concept.  This  method  of  vocoding  is  a  very  long 
way  from  implementation  but  it  should  provide  excellent  results  by  the 
end  of  the  century  for  low-rate,  highly  intelligible  speech 
transmission. 
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CHAPTER  V 


OPTIMUM  VOCODER  SYSTEM  SELECTION 


Overview 

The  chapters  preceding  this  one  have  provided  general  descrip¬ 
tions.  They  have  laid  the  groundwork  for  the  final  selections  to  be 
made.  This  chapter  applies  the  methodology  developed  in  Chapter  III 
to  each  of  the  systems  presented  in  Chapter  IV  except  for  the  nona- 
vailable  systems.  The  Figure-of-Merit,  Fs,  is  computed  individually 
for  each  system.  The  results  of  these  computations  are  used  to  iden¬ 
tify  the  optimum  systems  as  candidates  for  the  brassboard  effort  and 
the  LASARS  effort. 

System  Evaluations 

The  evaluation  is  performed  in  three  stages.  A  table  similar 
to  Table  3-2  is  filled  out  for  each  system  listed  in  Table  4-3  and 
described  in  Chapter  IV.  The  three  systems  from  this  group  achieving 
the  highest  F$  are  the  most  likely  candidate  systems  for  the  brass- 
board  application.  Next,  a  table  similar  to  Table  3-6  is  filled  out 
for  each  system  listed  in  Tables  4-3  and  4-4  with  the  additional 
descriptions  also  found  in  Chapter  IV.  The  three  systems  now 
achieving  the  highest  F$  are  ^e  most  likely  candidate  systems  for  the 
LASARS  application.  Again,  those  systems  listed  in  Table  4-2  are 


deleted  from  consideration.  Finally,  the  top  three  candidates  in  each 
category  are  compared  qualitatively  on  the  basis  of  information  and 
features  which  do  not  lend  themselves  to  quantitative  analysis. 


Selection  of  Brassboard  Applicable  Systems 

The  Fs  for  each  applicable  system  is  computed  in  this  section. 
Tables  5-l(a)  through  5-1(1)  show  these  computations.  Table  5-2  tabu¬ 
lates  these  values  and  shows  which  systems  use  estimated  values,  which 
systems  are  renormalized  for  absent  data,  and  which  systems  are 
nonacceptable  with  the  reasons  for  nonacceptability.  As  shown  in 
Table  5-2  the  top  three  candidates  for  the  brassboard  insertion  are 
the  (1)  MNSVS,  (2)  Manpack,  and  (3)  ATMMRP  vocoders,  all  from 
Motorola.  All  three  of  these  systems  are  2400  b/s,  LPC  vocoders. 

They  are  all  ANDVT  compatible,  therefore,  they  will  interface  with  the 
systems  the  Air  Force  is  currently  procuring  for  their  low  data  rate, 
secure  voice  communication  systems. 

When  comparing  these  vocoders,  the  only  major  differences  are 
in  the  size,  weight,  and  power  requirements.  These  vary  for  two  basic 
reasons.  They  are  designed  for  different  applications  and  employ  dif¬ 
ferent  chip  fabrication  technologies.  The  ATMMRP  is  a  desk-top  unit 
similar  in  appearance  to  a  standard  "call  director  telephone."  (89) 

This  vocoder's  intended  purpose  is  for  use  in  a  fixed- location,  secure- 
voice  network.  It  also  interfaces  with  the  Executive  Secure  Voice 
Network  (ESVN).  The  Manpack  employs  an  extension  of  the  ATMMRP  chip 
set  designed  to  be  a  portable  unit  for  use  in  secure  voice  radio 
communication  systems.  The  MNSVS  is  a  single,  handheld  unit  similar 


system:  CV-3591  Advanced  Narrowband  Digital  Voice  Terminal 
SOURCE:  ITT  Defense  Communications  Division _ 
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TABLE  5-1  (b) 
ATMMRP  VOCODER 


SYSTEM  ‘LEARNING*  TIME  (meac) 


MAN PACK  VOCODER 


PARAMETER  FIGURE  OF  MERIT  (Fp)  MAPPING 


333/U  VOCODER 


OVERALL  SYSTEM  FIGURE  OF  MERIT  (F«).  TOTAL  2.665 


CV-3333A/U  VOCODER 


PARAMETER  FIGURE  OF  MERIT  (Fp)  MAPPING 


L PC-24  VOCODER 


ODER 


OVERALL  SYSTEM  FIGURE  OF  MERIT  (Fa).  TOTAL  3  435 


SYSTEM;  Model  UVD-2000  Voice  Digitizer _ 

source:  GTE  Systems,  Communication  Systems  Division 
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SYSTEM:  2400  to  800b/s  LPC  Rate  Converter  (Vector  Quantized  LPC) 
SOURCE:  NRL  with  TRW  Corp _ 


TABLE  5-2 


NEAR-TERM  SYSTEM  COMPARISONS 


System 

F 

s 

Codest 

Comments 

C V -3591  ( ANDVT )  (ITT) 

3.585 

E 

ATMMRP  (Motorola) 

4.005 

E 

#3 

Manpack  (Motorola) 

4.285 

E 

#2 

MNSVS  (Motorola) 

5.085 

E 

#1 

CV-3333/U  (E -Systems) 

2.665 

*,E 

CV-3333A/U  (E-Systems) 

3.255 

E 

CV-3670/A  (E-Systems) 

3.595 

*»E 

LPC-24  (E-Systems) 

3.695 

E 

MRG-2000G  (GTE) 

3.425 

E 

UVD-2000  (GTE) 

3.645 

E 

CV-3832  ( MRVT )  (GTE) 

2.905 

*»E 

Vector  Quantized  LPC  (TRW) 

2.816 

*,E,R 

tCodes:  *  =  Nonacceptable  System 
E  =  Estimated  values  used 
R  =  A  renormalized  Fg  for  missing  values 
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to  a  "contempra"  telephone  handset.  The  size  and  power  reductions 
are  achieved  through  the  use  of  flatpack  and  leaded  chip  carrier 
technology  rather  than  DIP  chips  (89). 

A  more  detailed  description  of  these  vocoders  is  given  in  the 
following  excerpt  from  the  Motorola,  Inc.,  Product  Information  Report 
(89). 

ADVANCED  TECHNOLOGY  MODEL  MULTI  RATE  PROCESSOR  LPC  VOCODER 

The  Voice  Processing  Laboratory  located  at  the  Motorola 
Government  Electronics  Group  in  Scottsdale,  Arizona,  has  applied 
low  power  LSI  digital  signal  processing  capabilities  in  the  deve¬ 
lopment  of  an  Advanced  Technology  Model  Multi  rate  Processor 
(ATMMRP)  LPC  vocoder.  Developed  by  Motorola  for  the  Naval 
Electronics  System  Command,  it  is  compatible  with  the 
Advance  Narrowband  Digital  Voice  Terminal  (ANDVT)  or  the  Executive 
Secure  Voice  Network  (ESVN). 

The  ATMMRP  Vocoder  is  a  low  power,  small  size,  2400  Bit  per 
second,  full  duplex.  Linear  Predictive  voice  coder.  The  voice 
coder  requires  approximately  2.2  watts  while  the  LED  displays  and 
specialized  MIL  188  digital  outputs  require  another  2  watts  for  a 
total  DC  power  of  4.2  watts.  This  is  the  lowest  power  LPC  Vocoder 
yet  reported.  The  ATM  voice  coder  weighs  4.15  Kg  (9.1  lbs.),  and 
occupies  7,200  cc  (440  in^),  with  the  foot  print  of  a  typical  call 
director  telephone,  see  figure  1-1  [not  included  in  this  report]. 

All  high  performance  digital  signal  processing  performed  by 
the  ATM  LPC  vocoder  is  done  with  custom  large  scale  integration. 
Voicing,  serialization  of  data,  sync  acquisition  and  sync  main¬ 
tenance  algorithms,  and  sel f-test  functions  are  performed  in  soft¬ 
ware  in  an  MC68000  microcomputer. 

The  Ain  LPC  vocoder  utilizes  a  family  of  CMOS  integrated  cir¬ 
cuits  which  feature  a  low-cost,  low-power  consumption,  small  phy¬ 
sical  size  approach  to  measurement  of  the  speech  parameters.  The 
CMOS  DSP  IC  family  consists  of  an  LPC  analysis  IC,  and  AMDF  pitch 
extraction  IC,  and  an  LPC  synthesis  IC. 

Each  IC  is  a  microprogrammed  digital  signal  processor.  The 
analysis  (transmit)  IC  *  s  contain  internal  ROM  programming  to  pro¬ 
cess  speech  directly  from  an  A/D  converter  and  generate  parameters 
common  to  LPC/RELP  vocoder  algorithms.  Coding  of  resultant  output 
data  is  generalized  for  maximum  flexibility. 

The  Linear  Predictive  Coding  (LPC)  Analyzer  IC  performs  PARCOR 
linear  predictive  analysis  of  speech  for  LPC  vocoder  and  speech 
recognition  systems.  This  consists  of  estimating  and  removing 
cross-correlation  between  forward  and  backward  traveling  waves  in 
a  lattice  digital  model  of  the  vocal  tract.  The  purpose  of  the 
LPC  analysis  chip  is  to  perform  all  of  the  computationally  inten¬ 
sive  calculations  for  10-pole  LPC  analysis  and  energy  measurement 
on  a  single  intergrated  circuit. 


Residual  speech  output  for  pitch  extraction  and  RELP  coding  is 
also  provided.  A  block  diagram  showing  the  architecture  of  the 
analyzer  IC  is  shown  in  Figure  1-2  [Figure  5-1  in  this  report]. 

The  Average  Magnitude  Difference  Function  IC  is  a  high  perfor¬ 
mance  digital  signal  processor,  programmed  to  perform  the  pitch 
measurement  algorithm  used  in  all  Department  of  Defense  LPC-10 
vocoders,  speaker  identification  systems,  and  many  speech  recogni¬ 
tion  systems.  AMDF  is  a  robust  algorithm  for  measuring  pitch 
periods  of  speech  by  finding  the  time  delay  at  which  the  speech 
wave  form  is  most  repetitive.  The  time  delay  which  produces  a 
minimum  AMDF  is  the  pitch  period.  The  AMDF  operates  directly  on 
speech  from  an  A/D  Converter  and  outputs  results  to  a  host  CPU. 

By  performing  the  AMDF  analysis  in  a  dedicated  integrated  cir¬ 
cuit,  the  computation  rate  associated  with  pitch  and  voicing  ana¬ 
lysis  is  dramatically  reduced.  Furthermore,  the  I/O  structure  of 
the  AMDF  chip  is  designed  to  minimize  interface  requirements  on 
the  host  processor.  Even  the  simplest  processor  hosts  can  utilize 
the  computational  power  of  the  AMDF  IC.  Architecture  of  the  AMDF 
is  shown  in  Figure  1-3  [Figure  5-2  in  this  report]. 

The  voice  synthesizer  integrated  circuit  (IC)  is  a 
microprogranmable  CMOS  digital  signal  processor  programmed  to  per¬ 
form  linear  predictive  coding  (LPC)  voice  synthesis.  High  quality 
voice  synthesis  may  be  used  with  residual  excitation  to  achieve  a 
high  degree  of  natrualness  in  residual  excited  LPC  applications. 

It  also  contains  sufficient  circuitry  to  operate  on  internal  exci¬ 
tation  for  pitch-excited  LPC  applications. 

The  speech  synthesizer  IC,  like  the  others,  is  designed  to 
perform  all  the  computationally  intensive  arithmetic  for  speech 
synthesis  while  minimizing  the  load  on  the  host  processor. 

The  microprogramming  features  of  this  IC  allow  it  to  be  used 
for  lattice  all -pole  filters,  lattice  all -zero  filters,  general 
second-order  cascaded  sections  (FORMANT  synthesis),  or  line 
spectral  pair  synthesis  (LSP).  In  addition,  it  can  be 
microprogrammed  to  perform  special  function  filters  such  as  band¬ 
pass  or  low-pass  filters.  The  architecture  of  the  synthesizer  IC 
is  shown  in  Figure  1-4  [Figure  5-3  in  this  report]. 

A  Manpack  Portable  LPC  10  Vocoder 

A  manpack  portable  LPt-10  Vocoder  has  been  developed  which 
makes  substantial  size  and  power  performance  improvements  over 
existing  LPC  Vocoders,  by  extending  the  ATMMRP  chip  set  as  shown 
in  Figure  1-5  [not  included  in  this  report].  The  remaining  LPC-10 
algorithmic  components  are  partitioned  by  the  data  and  process 
flow  graphs  into  meaningful  multi-purpose  stand  alone  single  chip 
computers,  resulting  in  a  vocoder  that  uses  3  VLSI,  and  3  LSI  com¬ 
ponents.  The  digital  signal  processing  algorithms  are  partitioned 
as  follows:  LPC  Analysis  IC,  LPC  Synthesis  IC,  AMDF  Pitch  Extrac¬ 
tion  IC.  The  data  flow  processes  are  partitioned  into  microcom¬ 
puters  as  follows:  Transmit  Pitch  and  Voicing  in  processor  #1, 
Transmit  AGC  in  processor  #2,  and  Parameter  Quantization  and 
Serialization  in  processor  #3;  in  the  receive  mode  sync  acquisi¬ 
tion  and  maintenance  and  parameter  deserial ization  in  processor 


Figure  5-1.  Architecture  of  lPC  PARCOR  analyzer. 
(Taken  from  ref.  89,  p.  4) 
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Figure  5-2.  Architecture  of  AMDF  IC. 
(Taken  from  ref.  89,  p.  4) 


Figure  5-3.  Architecture  of  synthesizer  IC. 
(Taken  from  ref.  89,  p.4) 
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#3,  error  correction  and  dequantization  in  processor  #2,  and 
interpolation  rule  implementation  in  processor  #1. 

The  data  flow  processor's  partitions  were  greatly  affected  by 
the  use  of  single  chip  computers.  The  computers  have  a  very 
limited  RAM  and  ROM  space  causing  the  partition  to  be  dependent  on 
program  size.  The  use  of  single  chip  computers  minimizes  exter¬ 
nal  hardware  necessary  for  the  vocoder  implementation. 

.  The  overall  block  diagram  is  presented  in  Figure  1-6 
[Figure  5-4  in  this  report].  The  three  VLSI  speech  processing 
chips,  the  three  microprocessors,  and  the  analog  input  and  output 
logic  comprise  the  entire  half  duplex  system. 

The  latest  algorithms  have  been  used  in  Manpack  to  optimize 
performance  for  noisy  rapid  communication.  To  accomplish  this, 
the  voice/unvoice  and  pitch  tracking  algorithms  underwent  con¬ 
siderable  design  improvement.  Similarly  a  specially  designed  high 
performance  automatic  gain  control  has  been  designed  specifically 
for  the  rapid  communication  environment. 

Miniaturized  Narrowband  Secure  Voice  System 

Motorola  is  now  investigating  a  further  miniaturization  of  the 
Manpack  vocoder  by  using  flatpack  and  leaded  chip  carrier  tech¬ 
nology  rather  than  dual  inline  packages.  The  result  will  be  an 
LPC  Vocoder  that  fits  into  a  "contempra"  telephone  handset.  This 
further  size  reduction  makes  possible  an  entirely  new  market  for 
LPC  vocoders,  due  to  a  small  size,  portability,  and  flexibility  to 
be  used  with  a  variety  of  modern  technologies.  Furthermore,  MNSVS 
also  contains  a  microprocessor  based  KG  controller  to  enable  a 
variety  of  secrecy  levels  of  KG  to  be  used  with  vocoder.  The  KG 
control  microprocessor  mediates  link  synchronization  in  non  error 
extending  mode  with  bit  error  rates  up  to  10-2.  a  photograph  of 
MNSVS  is  shown  in  Figure  1-7  [not  included  in  this  report]. 


Selection  of  LASARS-Appl i cable  Systems 

Tables  5-3(a)  and  5-3 ( q)  show  the  Fs  calculations  for  the  far 
term  applicable  systems.  These  values  are  tabulated  in  Table  5-4.  As 
can  be  seen,  none  of  the  systems  under  consideration  achieve  all  of 
the  currently  specified  parameter  values  and  all  are  categorized  as 
nonacceptabl e  for  a  LASARS  implementation. 

The  Frame  Predictive  LPC  studies  of  the  Air  Force  with  ITT, 


the  MNSVS  from  Motorola,  the  Compact  LPC  Vocoder  from  Lincoln 
Laboratories  and  the  TI  VIS-Speech  Processor  (F  approximately  tied 
with  the  Compact  LPC  vocoder)  lead  the  list  of  systems  considered 


SYSTEM:  CV-3591  Advanced  Narrowband  Digital  Voice  Terminal  (ANDVT 
SOURCE:  ITT  Defense  Communications  Division _ 
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OVERALL  SYSTEM  FIGURE  OF  MERIT  (Fa).  TOTALq  gQQ 
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OVERALL  SYSTEM  FIGURE  OF  MERIT  (Ft).  TOTAL  1 . 330 


MANPACK  VOCODER 
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OVERALL  SYSTEM  FIGURE  OF  MERIT  (Ft).  TOTAL 


MNSVS  VOCODER 


OVERALL  SYSTEM  FIGURE  OF  MERIT  CF»).  TOTAL  2 . 660 
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OVERALL  SYSTEM  FIGURE  OF  MERIT  CF»).  TOTAL  g  390 
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OVERALL  SYSTEM  FIGURE  OF  MERIT  (Ft).  TOTAL  j  3gg 


TABLE  5-3  (i) 


OVERALL  SYSTEM  FIGURE  OF  MERIT  (Fa),  TOTAL  J  370 


TABLE  5-3  (j) 


OVERALL  SYSTEM  FIGURE  OF  MERIT  (Fa).  TOTAL  1.370 


TABLE  5-3  (k) 


OVERALL  SYSTEM  FIGURE  OF  MERIT  (Fs).  TOTAL  j  37Q 


VECTOR  QUANTIZED  LPC  VOCODER 


OVERALL  SYSTEM  FIGURE  OF  MERIT  (Ft).  TOTAL  Q.  777R 


FRAME  PREDICTIVE  LPC  VOCODER 


OVERALL  SYSTEM  FIGURE  OF  MERIT  (Fa).  TOTAL  2.080 
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SYSTEM  ’LEARNING*  TIME  (m—ct  0.005  [NONE* -  -  -  -  -  -  -  -  -  ANY 


VIS-SPEECH  PRXESSOR 


SYSTEM  ‘LEARNING*  TIME  (idmc)  0.005  [NONE! -  -  -  -  -  -  -  -  -  ANY 
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TABLE  5-4 

FAR -TERM  SYSTEM  COMPARISONS 
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I—  l  -  "L  -  -v  _-  .  - 

System 

F  Codest 

s 

Comments 

CV-3591  (ANDVT)  (ITT)  0.900 

ATNNRP  (Motorola)  1.330 

Manpack  (Motorola)  1.920 

MNSVS  (Motorola)  2.660 

CV-3333/U  (E-Systems)  0.890 

CV-3333A/U  (E-Systems)  1.390 

CV-3670/A  (E-Systems)  1.390 

LPC-24  (E-Systems)  1.390 

MRG-2000G  (GTE)  1.370 

UVD-2000  (GTE)  1.370 

CV-3832  (MRVT )  (GTE)  1.370 

Vector  Quantized  LPC  (TRW)  0.777 
Frame  Predictive  LPC  (ITT)  2.942 


Compact  LPC 

Channel  (LL) 

Frame  Fill  LPC  (LL) 


2.080 


0.862 


*,E 

*,E 

*»E 

*»E 

*»E,R 

*»E 


#3,  no  card  packaging 
incl uded 


0.270  *,E 


VIS-Speech  Processor  (TI)  2.060 


*  =  Nonaccept 
E  =  Estimated 
R  =  A  renorma 


Virtually  tied  with 
compact  LPC,  no  card 
packaging  included 


realizable  for  production  and  inclusion  in  a  LASARS  implementation. 
None  of  these  combine  all  of  the  desirable  attributes.  The  Frame 
Predictive  algorithm  along  with  the  NRL/TRW  Vector  Quantized  (F$=,777) 
system  and  several  other  systems  (nonavail able)  listed  in  Table  4-2 
prove  that  the  desired  data  rates  of  800  b/s  and  less  are  possible. 

The  MNSVS,  Compact  LPC,  and  the  VIS-Speech  Processor  prove  that  the 
required  small  size  necessary  in  tactical  aircraft  is  also  possible. 
Therefore,  with  a  slight  effort  at  combining  the  appropriate  tech¬ 
nologies  a  small,  low-rate  vocoder  should  be  possible.  The  Frame 
Predictive  LPC  algorithm  will  probably  require  an  extra  microprocessor 
or  signal  processing  chip  with  some  additional  memory  circuits  which 
then  could  be  added  to  a  system  such  as  the  MNSVS,  Compact  LPC,  or  the 
VIS-Speech  Processor  in  order  to  meet  the  desired  specifications. 

See  the  previous  excerpt  for  a  discussion  on  the  MNSVS. 
Additional  information  on  its  parameters,  characteristics,  and  chip 
functions  is  available  from  the  Motorola  Marketing  Division,  Vicki 
Crain  [18]  or  Bruce  Fette  [30],  The  Lincoln  Laboratory  Compact  LPC 
vocoder  is  presented  in  detail  in  Feldman  et  al .  [29],  and  from 
Blakenship  [10],  Gold  [44],  and  Paul  [100],  All  of  the  information  on 
the  TI  VIS-Speech  Processor  was  obtained  via  the  telephone  in  a  pri¬ 
vate  conversation  with  Langston  [72]  and  is  included  in  Chapter  IV. 


CHAPTER  VI 


CONCLUSIONS  AND  RECOMMENDATIONS 


Summary 

This  research  effort  has  identified,  as  thoroughly  as 
possible,  the  current  state  of  technology  in  vocoder  research  and  pro¬ 
duction.  It  has  presented  an  overview  of  LPI  communications  and  how 
vocoders  form  a  cornerstone  in  the  LPI  conceptual  design  study.  It 
has  described  how  vocoders  operating  within  the  communication  link 
provide  significant  gains  towards  the  operation  of  a  marginal  channel. 
The  speech  waveform  was  discussed  in  order  to  provide  some  insight 
into  the  problems  vocoder  developers  have  in  researching  low  data  rate 
voice  communication  methods.  It  then  presented  the  seven  major  forms 
of  vocoder  algorithms  or  approaches  to  speech  analysis/ synthesis. 

In  this  thesis,  a  method  for  quantitatively  comparing  one 
system  to  another  was  developed  with  a  discussion  of  each  quantitative 
parameter.  Each  system  of  the  thirty  eight  identified  was  presented 
in  table  form.  A  "first-cut"  elimination  eliminated  all  of  the  nona- 
vailable  systems  or  methods.  The  remaining  systems  were  individually 
presented  and  discussed. 

Finally,  the  comparison  method  was  applied  to  the  available 
systems  in  order  to  identify  those  most  suited  for  the  brassboard 


effort  to  be  tested  in  late  1986  or  in  1987  and  for  the  LASARS  to  be 
in  production  and  impl ementabl e  between  1994  and  1996. 

Conclusion 

At  this  time  several  vocoder  systems  exist  either  as  working 
models  or  producton  equipment.  Of  these,  three  have  been  identified 
as  the  most  applicable  to  the  brassboard  effort.  The  objective  of 
this  phase  of  the  LPI  Comm  ADP  is  to  flight  test  the  LPI  concepts.  The 
tests  will  be  performed  with  rack-mounted  equipment  in  the  cargo  sec¬ 
tion  of  an  Air  Force  cargo  transport-type  aircraft.  Considering  this, 
the  Man  pack  or  the  MNSVS  are  the  most  appropriate  systems  to  utilize. 
The  design  of  the  Manpack  as  shown  in  Figure  5-3  lends  itself  to  rack 
mounting.  The  MNSVS  could  be  attached  and  then  held  in  a  holster  when 
not  being  used.  The  ATMMRP  would  have  to  have  special  mounting  pro¬ 
vided  in  order  to  hold  it  in  place. 

Currently  no  vocoder  exactly  fills  the  needs  of  the  far-term 
effort.  Several  options  exist  of  which  one  or  more  will  have  to  be 
implemented  in  order  to  obtain  a  production  model  vocoder  to  fit  the 
LPI  needs  by  1993.  First,  the  minimum  specifications  could  be 
reviewed  and  relaxed  so  that  current  models  will  suffice  as  production 
equipment.  This  would  include  a  detailed  analysis  of  the  applications 
of  the  LASARS  to  determine  which  parameters  could  or  should  be 
modified.  Secondly,  after  the  brassboard  tests  are  concluded  and  if 
the  LPI  concept  proves  viable,  a  new  review  of  vocoder  technology 
could  be  conducted  with  a  modified  time  schedule  for  LASARS  produc¬ 
tion  possible.  Finally,  additional  research  funds  could  be  channeled 
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into  the  vocoder  research  efforts  with  the  express  purpose  of  com¬ 
bining  the  low  rate  algorithms  with  the  small  size  technology. 

Recommendations 

The  recommendations  included  here  are  general  in  nature  and 
are  presented  as  a  first  consideration  for  the  LPI  Comm  ADP  managers 
and  the  LPI  Comm  Conceptual  Design  Study  contractors.  In  the  brass- 
board  implementation  the  Motorola  Manpack  should  be  used.  The  only 
significant  difference  between  it  and  the  higher  F$  scoring  MNSVS  are 
the  size,  weight,  and  power  requirements.  This  system  should  be  chosen 
because  it  is  rack  mountable  which  makes  it  more  rugged  for  use  in  a 
test  environment.  Procurement  should  probably  be  initiated  as  soon  as 
possible  because  the  system  exists  only  as  an  engineering  prototype 
model . 

The  recommendations  for  the  far- term  effort  are  somewhat 
harder  to  make.  Additional  money  should  be  spent  in  order  to  advance 
the  level  of  vocoder  technology.  This  money  should  not  be  immediately 
dedicated  to  Air  Force  contractors  currently  providing  vocoder 
research  to  the  Air  Force.  Rather  additional  organizations  such  as 
Motorola,  E-Systems,  TI,  etc.  should  have  an  opportunity  to  bid  on 
this  research  because  they  have  extensive  speech  processing  research 
capabil ities. 
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SPEECH  BANDWIDTH  COMPRESSION  GAINS 
IN  LPI  COMMUNICATIONS 

An  LPI  comnuni cations  system  is  an  attempt  to  maximize  the 
likelihood  of  correct  reception  of  voice  and/or  data  transmissions  by 
an  intended  receiver  while  minimizing  the  likelihood  that  an  intercept 
receiver  will  be  able  to  detect  the  comnuni cation  process  in  progress. 
This  is  accomplished  through  the  use  of  a  comnuni cations  system  incor¬ 
porating  any  or  all  of  the  techniques  mentioned  in  Chapter  I.  The 
goal  is  to  operate  at  absolutely  the  lowest  possible  RF  energy  level 
necessary  to  convey  the  information  to  the  intended  receiver.  As 
indicated  by  the  list  of  technologies  under  investigation,  this  pro¬ 
cess  will  probably  be  an  adaptive  one,  continuously  changing  the 
operating  characteristics  within  a  closed  loop  communications 
situation.  The  vocoder  used  for  speech  bandwidth  compression  is 
expected  to  yield  significant  gains  towards  the  LPI  capabilities  of 
the  composite  LPI  system. 

A  vocoder  fits  into  the  communications  link  as  previously 
described  in  Chapter  II.  The  vocoder  will  be  utilized  to  replace  the 
PCM,  ADPCM,  DM,  etc.,  modems  now  used  in  digital  radio  systems  in  the 
source  encoding  portion  of  the  system  (see  Figure  2-1).  The  source 
encoder  outputs  the  vocoder  data  at  a  specific  rate,  Ry.  The  encryp¬ 
tion  process  rearranges  the  data  or  combines  it  with  a  known  (to  the 
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receiver)  sequence  of  information  to  prevent  unwanted  interpretation 
or  decoding  of  the  data.  The  channel  encoder  generally  adds  bits  to 
the  data  stream  proportional  to  the  incoming  rate  for  error 
protection/correction  purposes.  The  actual  number  of  bits  added 
depends  upon  the  amount  of  error  protection  desired.  At  this  point, 
the  output  is  the  bit  rate,  R,  which  is  the  system  data  rate.  There¬ 
fore,  the  vocoder  establishes  the  overall  system  bit  rate. 

The  communications  channel  is  fixed  such  that  a  value  termed 
processing  gain,  PG,  is  the  variable  directly  affected  by  the  data 
rate.  This  in  turn  affects  the  maximum  coherent  reception  and  inter¬ 
ception  ranges.  By  starting  with  the  range  equations,  the  processing 
gain  can  be  derived  to  show  the  LPI  gains  achieved  by  reducing  the 
data  rate. 

The  coherent  reception  range,  RR,  is  defined  as  the  maximum 
range  at  which  an  i ntended  receiver  may  detect  the  communication 
signal.  The  interception  range,  Rj,  is  defined  as  the  maximum  range 
at  which  an  uni ntended  receiver  may  detect  the  communication 
emissions.  The  free  space  equations  expressing  those  ranges  in  system 
parameters  are  given  below  (54).*  The  range  for  intended  reception 
is: 


R 


2 

R 


PTt0GTRGRRX 


(Al-1) 


*kTF  in  the  denominators  is  actually 
assuming  ideal  conditions  l’ca1’s. 
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and  the  range  for  interception  is: 


ptgtigrix 


UwmjFjLjdjBj 


(Al-2) 


where  Py  is  the  transmitter  peak  power,  tQ  is  the  coherent  integration 
time,  Gy  and  GR  are  the  appropriate  transmit  and  receive  antenna 
gains,  respectively,  X.  is  the  carrier  frequency  wave  length,  k  is 
Boltzman's  constant,  T  is  the  appropriate  receiver  noise  temperature, 

F  is  the  appropriate  receiver  noise  figure,  L  is  the  appropriate 
receiver  loss  figure,  d  is  the  appropriate  signal-to-noise  (SNR)  power 
ratio  required  for  detection  (after  coherent  processing),  and  B  is  the 
appropriate  receiver  noise  bandwidth. 

Multiplying  the  numerator  and  demoninator  of  (Al-1)  by  the 
receiver  bandwidth  BR,  the  time-bandwidth  product,  0R,  where: 


°R  =  t0BR 


is  obtained.  Equation  (Al-1)  can  now  be  rewritten  as: 


(Al-3) 


r2  _  j.  G  TR6  RRx 

(4ir)2kTRFRLRdRBR 


■3  drpt  * 


( Al-4) 


Letting  the  terms  in  brackets  equal  a  constant,  MR,  (Al-4)  can  be 
reduced  to: 


PR  =  MRDRPT  ' 


(Al-5) 


Equation  (Al-2)  can  be  rearranged  as: 


r2  «  [_.G  TIG.RIX - .]  Pi 
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or 
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(Al-7 


The  maximum  operating  range,  R^,  is  determined  by  system  and 
mission  requirements.  For  this  research  effort,  worst-case  con¬ 
ditions  are  assumed  giving  the  situation  where  the  coherent  receiver 
and  the  interception  receiver  have  identical  system  characteristics. 
This  means  that  ^  *  Mj.  In  actuality,  these  values  will  not  be 
equal  but  because  the  terms  of  the  expression  are  essentially 
constants,  they  will  be  fixed  and,  therefore,  proportional.  Equation 
simplification  is  the  result  of  this  assumption.  Now,  relating  the 
coherent  receiver  range  to  the  intercept  receiver  range  yields: 
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resulting  in: 
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LPI  communication  techniques  attempt  to  maximize  this  ratio. 
This  ratio  is  generally  much  greater  than  one  because  the  coherent 
receiver  has  some  a  priori  knowledge  about  the  signal  being 
transmitted.  It  knows  the  format  of  the  signal  and  how  to  process  it 
to  get  maximum  value.  This  leads  to  the  concept  of  processing  gain, 

PG,  which  is  an  alternative  designation  for  the  time-bandwidth  pro¬ 
duct,  Dr.  Therefore, 

Dr  =  PG  (Al-10) 

is  a  function  of  the  RF  bandwidth  and  integration  time  or  bit  duration. 

Maximizing  PG  maximizes  (Al-9)  and,  therefore,  (Al-8).  Relating 

2 

(Al-10)  to  (Al-5) ,  if  Rr  is  fixed  and  MR  is  a  constant,  then  maxi¬ 
mizing  Dr  minimizes  PT,  the  required  transmitter  power.  Reducing 
P-j.  in  (Al-7)  reduces  the  interception  range,  Rj,  making  the  system 
less  susceptible  to  interception  as  desired. 

The  processing  gains  can  often  be  more  clearly  seen  in  SNR 
equations.  It  can  be  shown  that  increasing  PG  decrease  the  RF  SNR 
required.  The  processing  gain  or  time-bandwidth  product  is: 

PG  =  tQB  (Al-11) 

or 


(Al-12) 


where  B  is  the  receiver  RF  bandwidth.  The  RF  SNR  is: 


(|)RF  =  10  log  (^)  ,  (Al-13) 

where  S  is  the  RF  signal  power  in  dB,  N  is  the  RF  noise  power  in  dB, 
s  is  the  RF  signal  power  in  watts,  and  n  is  the  RF  noise  power  in 
watts.  Now: 

s  =  ebR  (Al-14) 

and 

n  =  ngB  (Al-15) 

where  e.Q  is  the  energy  per  bit,  R  is  the  data  rate,  and  nQ  is  the 
noise  per  cycle  of  bandwidth,  then: 


(§)  >  10  log  (jpp-l 

N  RF  VRF  ’ 


(Al-16) 


This  can  be  rewritten  as: 


(I)  =  10  log 

N  RF  n0BRF 


eb^  R 

D  Mr-)] 

bb 


(Al-16) 


where  Bbb  is  the  baseband  bandwidth  which  is  dependent  upon  the  modu 
lation  scheme  used.  (Here  BPSK  is  assumed  so  that  |  R  |  =  |  3^^  | 
according  to  Nyquist's  theory  as  given  in  [118]).  Now,  with 
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and 

nOBbb  =  nbb  » 

where  is  the  baseband  signal  power  in  watts  and  n^ 
band  noise  power  in  watts,  the  RF  SNR  is  given  by: 
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This  can  be  rewritten  as: 


(f)  -  10  log  (“)  +  10  log  (J-) 

N  RF  "bb  % 

or  alternatively 

c  shh  Bnr 

(J)  =  10  log  (^)  -  10  log  (-£-)  . 

N  RF  "bb  R 


Since  the  data  rate,  R,  is  the  reciprocal  of  the 
second  logarithmic  term  in  (Al-22)  is  the  time-bandwidth 
dB.  The  terms  in  (Al-22)  can  be  expressed  as: 


10  log  (-^)  =  (J) 

nbb  "  bb 


( Al-19) 
s  the  base- 

(Al-20) 

(Al-21) 

(Al-22) 

time,  tg,  the 
product  in 


( Al-23) 


which  is  the  baseband  SNR  in  dB,  and 

10  log  (— )  =  PG  (Al-24) 

R 

which  is  the  processing  gain  in  dB.  Now  (Al-22)  can  be  written  as: 

(I)  =  (|)  -  PG  .  ( Al-25) 

N  RF  bb 

In  (Al-25)  the  baseband  SNR  is  determined  by  the  information  extrac¬ 
tion  circuits  of  a  receiver.  This  value  is  the  minimum  SNR  required 
to  obtain  a  maximum  probability  of  correctly  interpreting  the  data. 

In  the  final  analysis,  (Al-25)  shows  that  decreasing  the  bit  rate 
increases  PG  and  correspondingly  a  decrease  in  the  SNR  at  the  receiver 
front  end  is  available  to  allow  the  transmitter  output  power  to  be 
reduced.  Therefore,  vocoder  data  rate  reductions  for  speech  bandwidth 
compression  are  directly  applicable  to  LPI  communication  systems. 
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DESCRIPTION  OF  THE  SPEECH  WAVE 


Speech  is  the  acoustic  end  product  of  voluntary,  formalized 
motions  of  the  respiratory  and  masticatory  apparatus.  It  is  a 
motor  behavior  which  must  be  learned.  It  is  developed,  controlled 
and  maintained  by  the  acoustic  feedback  of  the  hearing  mechanism 
and  by  the  kinesthetic  feedback  of  the  speech  musculature.  Infor¬ 
mation  from  these  senses  is  organized  and  coordinated  by  the  cen- 
teral  nervous  system  and  used  to  direct  the  speech  function, 
impairment  of  either  control  mechanism  usually  degrades  the  per¬ 
formance  of  the  vocal  apparatus  (33). 

The  purpose  of  speech  analysis-synthesis  systems  is  to  effi¬ 
ciently  encode  the  sounds  of  speech,  transmit  and  receive  this  encoded 
signal,  and  decode  the  signal  into  perceptually  significant  sound.  In 
order  to  best  understand  the  various  analysis— synthesis  techniques,  a 
reasonable  understanding  of  the  characteristics  of  the  acoustic 
(speech)  waveform  is  needed.  The  speech  waveform  can  be  characterized 
as  the  response  of  a  slowly  time  varying  system  to  either  a  quasi - 
periodic  or  a  noise-like  excitation. 

More  specifically,  the  speech-production  mechanism  consists 
essentially  of  an  acoustic  tube,  the  vocal  tract,  excited  by  an 
appropriate  source  to  generate  the  desired  sound.  In  the  case  of 
voi ced  speech  sounds,  the  excitation  corresponds  to  a  quasi - 
peri odi c  pulse  train  representing  the  air  flow  through  the  cords 
as  they  vibrate.  The  fricative  sounds  are  generated  by  forcing 
air  through  a  constriction  in  the  vocal  tract,  thereby  creating 
turbulence,  which  produces  a  source  of  noise  to  excite  the  vocal 
tract  (94). 

The  fricative  sounds  mentioned  above  are  classed  as  unvoiced 
speech.  This  category  also  includes  plosives.  The  voiced  sounds 
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include  vowels,  nasals,  and  glides.  The  unvoiced  sounds  are  the  modu¬ 
lation  of  a  noise-like  excitation  with  the  spectral  envelope.  The 
voiced  sound  (vowels)  are  the  modulation  of  a  more-or-less  periodic 
excitation  (vocal  cord  vibration  with  fundamental  frequency  equal  to 
1/excitation  period)  with  the  spectral  envelope.  Figures  A2-1,  A2-2, 
and  A2-3  show  a  relatively  long  segment  of  speech  showing  samples  of 
the  various  types  of  sounds.  Figures  A2-4  and  A2-5  depict  represen¬ 
tations  of  these  types  of  sounds. 

The  vocal  tract  can  be  assumed  to  be  a  linear  time- varying 
system.  Now,  if  the  vocal -tract  shape  is  fixed,  or  nearly  so  (slowly 
varying),  the  output  of  the  system,  the  speech  waveform,  s(t),  is 
approximated  fairly  accurately  as  the  convolution  of  the  given  excita¬ 
tion  source,  c(t),  and  the  vocal-tract  impulse  response,  v(t),  given 
as 


s ( t )  =  e(t)  *  v(t) 


(A2-1) 


In  other  words,  the  Fourier  transform  (spectrum)  of  the  output  is  the 
product  of  the  spectrums  of  the  excitation  function  and  the  vocal - 
tract  impulse  response; 


S ( f )  =  E( f ) V ( f )  . 


( A2-2) 


The  model  is  limited,  and  various  difficulties  can  be  noted. 
For  example,  the  binary  voicing  decision  does  ndt  provide  for 
voiced  fricatives  (phonemes  with  simultaneous  periodic  and 
aperiodic  excitation  and  a  different  filter  for  each  excitation). 
The  period  of  the  periodic  excitation  may  change  rapidly  or  may 
only  be  quasi -periodic— either  of  which  may  cause  sections  of  a 
short-term  spectrum  to  be  aperiodic.  The  filter  itself  may  also 
change  quite  rapidly  (99). 
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Figure  A2-1.  Beginning  of  speech  waveform  of  the  utterance 
"We  pledge  you  some  heavy  treasure". 

(Taken  from  reference  42,  p.  1638) 
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Figure  A2-4.  Details  of  network  representation  of  vowels. 
(Taken  from  ref.  46,  page  133) 
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Figure  A2-5.  S-plane  representation  of  voiceless 
plosives  and  voiceless  fricatives. 

(Taken  from  ref.  46,  p.  133) 


S(t)  =  e(t)*  v(t) 


S(w)  =  E(w)  V(w) 


Figure  A2-6.  Model  of  speech  production  as  the  response 
of  a  quasi-stationary  linear  system;  (a)  time-domain 
characterization  and  (b)  frequency-domain 
characterization. 

(Taken  from  ref.  33,  p.  121) 
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Figure  A2-6  depicts  the  convolution  and  the  spectrum  product  pro¬ 
cesses. 

The  spectrum  of  the  vocal  tract  is  a  smooth,  slow-varying 
function  of  frequency.  The  relative  maximums,  shown  in  Figure  A2-6b, 
correspond  to  the  resonant  frequencies  of  the  acoustic  cavity,  com¬ 
monly  called  formant  frequencies  or  just  formants.  The  slowly-varying 
function  is  generally  known  as  the  speech  envelope-structure  or 
spectral  envelope,  G(f,t).  The  quasi-periodic  excitation  function 
has  a  period  of  approximately  T.  This  produces  a  spectrum  of  pulses 
spaced  2ir/T  apart.  The  frequency  2ir/T  is  the  fundamental  frequency  or 
voice  "pitch."  The  pitch  is  essentially  constant  (generally  small 
variations)  for  an  individual  speaker  but  varies  significantly  between 
speakers.  Pitch  varies  from  approximately  50  Hz  in  adult  men  to  about 
400  Hz  in  women  and  children.  The  quasi-periodic  function  is  referred 
to  as  the  speech  fine-structure,  F(f,t). 

All  of  the  different  characteristics  mentioned  above  must  be 
determined  in  some  form  or  another.  The  pitch  is  determined  separ¬ 
ately  from  the  voiced  or  unvoiced  information,  which  is  determined 
separately  from  the  frequency  content  and  signal  amplitude. 
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DESCRIPTION  OF  VOCODER  TECHNIQUES 

There  are  two  different  concepts  in  speech  coding.  These  are 
waveform  encoding  and  source  encoding.  Waveform  encoding  is  essen¬ 
tially  direct  sampling  and  encoding  of  the  speech  waveform  itself. 

This  form  attempts  to  completely  model  and  quantize  the  wave  and 
generally  requires  much  higher  data  rates  than  source  encoding.  It  is 
usually  just  an  A/D  conversion  with  appropriate  resolution  followed  by 
the  appropriate  modulation  scheme.  Source  encoding  attempts  to  model 
some  aspect  of  the  vocalization/perception  process,  usually  the  vocal 
tract  response  function  and  the  excitation  function,  at  fairly 
low  data  rates.  Vocoders  are  a  form  of  source  encoder.  Figure  A3-1 
depicts  the  differences  in  the  two  forms.  As  can  be  seen  in  the 
figure,  each  form  has  its  own  advantages  and  applications.  The  most 
significant  of  these  differences  is  the  speech  quality  out  of  the 
receiver.  Since  the  data  rate  for  waveform  encoders  obviously 
exceeds  the  requirements  of  this  research  effort,  they  will  not  be 
discussed  in  this  text.  Source  encoders  are  generally  classified  by 
the  speech  analysis  techniques  used.  They  are  also  often  classified 
by  their  physical  structure  or  sometimes  by  the  parameters 
transmitted. 

A  source  coder  or  vocoder  attempts  to  analyze  and  characterize 
a  speech  waveform.  The  system  must  determine  the  type  of  sound. 
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voiced  or  unvoiced,  the  excitation  period  or  pitch  if  voiced,  and  the 
frequency  energy  content  of  the  signal.  The  vocoder  must  then 
characterize  these  parameters  in  such  a  manner  that  a  synthesizer  can 
input  them  and  regenerate  the  speech  wave  as  nearly  identical  as 
possible  to  the  original  wave  analyzed.  This  appendix  presents  a 
detailed  description  of  the  seven  major  vocoder  methods.  These 
methods  are  the  channel,  formant,  homomorphic,  pattern-matching, 
phase,  linear  predictive  coding,  and  spectral  envelope  estimation 
vocoders.  Other  minor  techniques  exist  which  are  generally  slight 
modifications  or  combinations  of  one  or  more  of  these  major  methods. 

Channel  Vocoder 

The  earliest  vocoder  dates  back  to  1928  when  Homer  Dudley  of 
Bell  Telephone  Laboratories  (115)  sketched  a  device  later  to  become 
known  as  the  "vocoder."  This  early  voice  coder  is  the  forerunner  to 
what  is  now  known  as  the  spectrum  channel  vocoder  or  channel  vocoder. 

The  channel  vocoder  is  depicted  in  Figure  A3-2.  It  consists 
of  a  number  of  channels.  Each  of  the  spectral  channels  shown  here 
consists  of  a  bandpass  filter,  a  rectifer,  and  a  low  pass  filter.  The 
bandpass  filters  are  established  to  continuously  cover  the  desired 
speech  bandwidth  with  a  cut-off  frequency  usually  between  3  kHz  and  4 
kHz.  The  end  result  of  this  series  of  channels  is  an  estimate  of  the 
spectral  envelope,  |  G( f , t)  |  .  Because  speech  is  a  time-varying  (only 
quasi -periodic)  function  (see  Figures  A2-1,  A2-2,  A2-3,  and  A2-6), 
infinite  spectral  analysis  is  not  possible  and  is  replaced  by  short- 
time  spectral  analysis.  This  method  utilizes  a  time-window  of 


spectrum  channel  vocoder. 
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duration  approximately  equal  to  the  shortest  speech  sounds.  This  has 
been  determined  to  be  approximately  40  ms.  Within  the  window,  speech 
is  assumed  to  be  stationary.  Equation  A2-2  is  written  as: 

S(f,t)  =  E(f,t)V(f,t)  .  (A3-1) 

The  spectrum  can  be  expressed  in  terms  of  the  spectral  envelope, 

F(f,t),  and  the  spectral  fine-structure,  G(f,t),  as: 

S(f,t)  =  F(f,t)G(f,t)  (A3-2) 

The  channel  vocoder  performs  a  short-time  Fourier  analysis.  The 
Fourier  transform  of  a  discrete  signal  is  given  by: 

X{ej  }  =  l  x(nT)e“jwnT  .  (A3-3) 

n=-« 

The  short-time  transform  is  given  as: 

49 

X(w,nT)  =  l  x(rT)h(nT_rT)e-jwrT  .  (A3-4) 

r— » 

This  equation  is  the  infinite-time  Fourier  transform  of  the  speech 

signal  seen  at  time  nT  through  a  time  window  with  response  h(nT)  as 

shown  in  Figure  A3-3.  The  window  response  h(nT)  transformed  is 
i  wT 

H(e  ).  This  response  is  usually  chosen  to  approximate  the  ideal 
low  pass  filter.  Filters  perform  the  analysis  on  the  analog  signal. 

If  the  bandpass  filter  response  is  limited  to: 


hk( nT)  =  h(nT)  cos  (wknT), 


(A3-5) 
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then  the  bandpass  output  is  given  by: 

n 

yk(nT)  =  I  x(rT)h(nT-rT)  cos  [wk(nT-rT)], 
r=-» 

or 

yk(nT)  =  Re[ejwknTX(wk,nT)].  (A3-6) 

The  remaining  channel  makes  the  voiced/unvoiced  (V/UV)  deci¬ 
sion  and  if  voiced  extracts  the  pitch.  If  the  speech  is  voiced,  the 
pitch  signal  sets  the  frequency  of  the  pulse  source  and  the  V/UV 
signal  selects  the  pulse  source  excitation  to  be  modulated  with  the 
estimated  envelope.  If  the  sound  is  unvoiced,  the  "white"  noise 
source  of  excitation  is  chosen. 

Obviously,  controlling  the  number  of  channels  helps  control 
the  bit  rate.  It  seems  that  14  to  20  channels  provides  the  optimum, 
practical  number  of  vocoder  channels.  This  generally  provides  bit 
rates  between  1,000  bps  and  4,800  bps  with  good  quality  (objectively 
speaking)  speech. 

Formant  Vocoder 

The  formant  vocoder  provides  a  somewhat  more  sophisticated 
approach  to  the  spectral  analysis  of  speech  than  the  channel  vocoder. 
The  spectral  envelope  contains  several  prominent  peaks  around  which 
the  frequency  components  of  the  speech  signal  are  grouped.  These 
peaks  are  local  resonances  or  resonant  frequencies  of  the  focal  tract 
and  are  known  as  formants.  Below  3  kHz  there  are  usually  three  for¬ 
mants  and  below  4  kHz  there  are  usually  four  or  five  formants.  The 
statement  "there  are  usually  .  .  .  below"  indicates  a  general  location 


of  the  formants.  These  spectral  peaks  tend  to  shift  with  the  produc¬ 
tion  of  different  sounds.  Figure  A3-4  shows  a  sample  of  two  speech 
spectrograms.  The  formants  are  frequency  groupings  appearing  in  the 
center  of  the  more-or-less  horizontal  shadings.  The  narrow  band  ana¬ 
lysis  of  (b)  best  indicates  these  formant  frequencies.  Figure  A3-5 
shows  the  formant  and  pitch  structure  of  a  small  portion  (the  segment 
between  1.6  and  1.7  seconds)  of  Figure  A3-4  in  more  detail. 

The  analyzer  portion  of  the  vocoder  determines  and  encodes  the 
formant  amplitudes,  frequencies,  and  their  associated  bandwidths. 
Additionally,  the  pitch  and  V/UV  determinations  are  made  and  encoded. 
Various  methods  exist  for  tracking  and  extracting  formant  information 
[50,  83,  85,  112].  More  recent  efforts  [4,  50,  83,  88]  have  deter¬ 
mined  that  spectral  analysis  by  linear  prediction  is  the  most  accurate 
method  for  formant  extraction.  Equations  A3-37  through  A3-40 
describing  the  operation  of  LPC  vocoders  result  in  an  error 
expression  for  the  difference  between  the  actual  speech  signal,  s(n) 
and  the  sampled  speech  signal,  s(n).  Equation  A3-44  is  a  set  of 
simultaneous,  linear  autocorrelation  equations.  This  set  of  equations 
result  in  a  spectral  analysis  of  the  speech  signal.  From  this 
spectral  signal,  the  formant  frequencies  and  amplitudes  are  deter¬ 
mined  by  simple-peak  picking  methods.  More  sophisticated  vocoders 
employ  additional  methods  [132]  for  extracting  formants  when  they  tend 
to  merge  together  or  when  "extra"  peaks  are  identified  as  possible 
formants.  These  methods  are  usually  implemented  in  software  to  effect 
decision-making  options.  Figure  A3-6  and  A3-7  show  example  block 
diagrams  of  formant  vocoders. 


AMPLITUDE  (OB) 


136 
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Figure  A3-5.  Logarithmic  spectrum  of  "EE"  sound  in  procEEdings. 
Spectral  envelope  shows  four  peaks  or  formants.  The  fine 
structure  corresponds  to  a  Fundamental  frequency 
of  about  110  HZ. 

(Taken  from  ref.  115,  p.  722) 


Figure  A3-6.  Block  diagram  of  the  procedure  for  determining 
formant  frequencies  and  amplitudes. 

(Taken  from  ref.  132,  p.  346) 


MULTIPLEXING 


DETECTOR 

Figure  A3-7.  Formant  vocoder  using  parallel  synthesis. 

(Taken  from  ref.  115,  p.  725) 


A  second  method  is  based  on  spectral  moments,  such  as  the 
centroid  of  the  signal  spectrum  in  a  formant  frequency  band  (116). 

The  mean  frequency  can  be  measured  by  dividing  the  speech  band  into 
sub-bands  with  filters  and  measuring  the  amplitude,  an,  in  each.  The 
mean  frequency  is  then: 


mean 


(A3-7) 


where  N  is  the  number  of  subbands  utilized.  This  can  be  performed  in 
the  time  domain  by: 


f  -  1  l^ts(t)  I 

mean  27  | 


(A3-8) 


where  the  bars  indicate  the  long-time  averages. 

The  problem  with  formant  vocoders  of  this  type  is  that  usually  the 
formant  sub-bands  overlap. 

Analysis-by-synthesis  vocoders  (7)  are  a  highly  specific  form 
of  formant  vocoder  not  utilizing  the  previously  described  analysis 
methods.  These  vocoders  iteratively  generate  artificial  spectra  which 
are  matched  to  the  windowed  segment  of  the  speech  spectrum.  After 
matching,  the  formant  characteristics  of  the  spectrum  generator  are 
taken  as  those  of  the  actual  speech.  For  real-time  operation,  a 
highly  complex,  extremely  fast  computer  is  necessary  in  order  to 
generate  the  spectrum  iterates  fast  enough. 


Formant  vocoders  use  the  same  V/UV  and  pitch  extraction  tech¬ 
niques  as  the  channel  vocoders.  The  synthesis  system  uses  the  same 
pulse  or  noise  excitation  signals  for  remodulating  the  speech  signal. 

Homomorphic  Vocoder 

The  homomorphic  or  cepstrum  vocoder  is  an  even  more  complex, 
sophisticated  system  of  speech  analysis-synthesis  than  those  pre¬ 
viously  mentioned.  These  systems  utilize  more  recent  advances  in 
FFT  computation  and  signal  deconvolution  to  analyze  and  then  synthe¬ 
size  the  voice  conmuni cations.  A  high-resolution  spectral  analysis  or 
Fourier  transform  is  computed  on  a  windowed  segment  (20-40  ms)  of 
speech  (42).  Within  this  segment,  the  speech  waveform  is  very  nearly 
stationary,  thereby  allowing  the  application  of  the  Fourier  transform 
resulting  in  the  spectrum  amplitude,  as  in  the  channel  vocoder.  The 
logarithm  of  the  spectrum  amplitude  is  then  computed.  This  process 
changes  the  speech  waveform  from  a  convolution  of  two  functions  into  a 
product  of  two  functions  and  finally  into  a  sum  of  two  functions. 

(The  Z-transform  could  be  utilized  as  readily  as  the  Fourier 
transform.)  The  speech  signal  is  described  by: 

s(nt)  =  v(nt)*e(nt),  (A3-9) 

where  s(nt)  is  the  speech  waveform,  v(nt)  is  the  vocal  tract  impulse 
response,  and  e(nt)  is  the  excitation  function.  The  spectrum  is  given 
by: 


S(f)  =  V(f)E(f). 

The  spectrum  log  magnitude,  S(f),  is  given  by: 


( A3-10) 


S(f)  =  ln[S(f)] 


(A3-11) 


Inserting  (A3-10)  yields: 

S(f)  =  1 n[V(f)E(f)]  (A3-12) 

which  expands  into: 

S(f)  *  1n[V(f)]  +  ln[E(f)]  (A3-13) 

The  log  magnitude  of  the  spectrum  is  then  transformed  back  into  the 
time  domain  (inverse  transform).  This  results  in  the  ceptsrum,  C(nt), 
a  signal  where  the  envelope  function  (vocal -tract  response)  is 
concentrated  in  the  low- time  values  and  the  excitation  (pitch)  appears 
as  a  periodic  set  of  lines.  The  cepstrum  is  given  by: 

c(nt)  =  F'^Sff)]  ,  ( A3-14 ) 

therefore, 

c(nt)  =  F-1(ln[V(f)])  +  F_1(ln[E(f)])  .  (A3-15) 

If  a  low-time  window  is  multiplied  with  the  cepstrum,  a  smoothed  enve¬ 
lope  is  the  result  (cepstral  smoothing).  This  smoothed  envelope  is 
easily  quantized  and  transmitted  to  the  synthesizer.  Since  the  pitch 
appears  as  a  series  of  pulses,  the  period  is  reasonably  easy  to  deter¬ 
mine.  This  process  is  depicted  in  Figure  A3-8.  Figure  A3-9  shows  a 
block  diagram  of  a  cepstrum  vocoder. 

Pattern-Matching  Vocoder 

The  pattern-matching  vocoder  is  another  spectrum  analysis 
system.  It  has  stored  within  it  a  series  of  spectral  patterns  which 
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TIME  (msec)  TIME  (msec)  FREQUENCY  (kHz) 

Figure  A3-8.  Homomorphic  analysis  for  voiced  and  unvoiced  speech. 

(Taken  from  ref.  104,  page  690) 


A3-9.  Block  diagram  of  a  homomorphic  vocoder 

(Taken  from  ref.  42,  p.  1643) 


match  the  number  of  "just-discernible"  sounds.  The  number  of  these 
patterns  differs  from  system  to  system  according  to  how  the  designer 
determined  what  constitutes  the  just-discernible  sounds  or  according 
to  what  he  determines  is  the  minimum  number  of  sound  patterns  neces¬ 
sary  to  accurately  recreate  the  speech  sounds.  The  pattern  matching 
in  today's  digital  systems  is  generally  performed  by  point-by-point 
subtraction.  The  patterns  to  be  matched  can  be  generated  by  any 
number  of  other  schemes.  A  channel -type  vocoder  system  can  be 
employed  to  generate  smaller  frequency  elements  each  of  which  is  a 
segment  of  the  Fourier  transform.  Or  a  formant  analysis  may  be  per¬ 
formed  with  the  formant  trajectories  compared  to  idealized  formants. 
Cepstral  analysis  could  also  be  employed.  Once  the  speech  is  ana¬ 
lyzed,  it  is  compared  to  the  patterns  in  memory.  For  a  pattern  to  be 
a  match,  the  error  must  satisfy  some  minimum  error  distance  measure. 
For  transmission,  the  memory  1 ocati on  of  the  "best  guess"  pattern  is 
transmitted.  Additionally,  the  pitch  and  V/UV  data  must  be  determined 
and  transmitted.  At  the  synthesizer,  the  address  is  used  to  retrieve 
the  "correct"  pattern.  This  pattern  is  then  modulated  with  the  pitch 
and  V/UV  source  to  regenerate  the  speech  sounds.  With  the  fast 
microprocessors  available  today,  this  method  is  becoming  more 
appealing. 

Phase  Vocoder 

In  the  phase  vocoder,  the  speech  signal  is  represented  by  its 
complex,  short-time  Fourier  transform  or  in  other  words  by  its  short- 
time  amplitude  and  phase  spectra  (35,  111).  The  system  uses  a 
bank  of  adjacent  bandpass  filters  (channels)  to  perform  the  Fourier 


analysis.  After  filtering,  the  channels  are  recombined  with  an  essen¬ 
tially  insubstantial  degradation.  This  is  shown  in  Figure  A3-10.  The 
output  of  the  n-th  filter  is  fn< t) .  The  reconstructed,  approximate 
signal  is  given  by: 

M 

f(t)  =  l  f„(t).  (A3-16) 

.  n 
n=l 

The  n-th  filter  has  an  impulse  response  given  by: 

g  (t)  =  h(t)  cos  w  t  ,  (A3-17) 

n  n 

where  h(t)  is  the  impulse  response  of  a  physically-realizable  low-pass 
filter.  The  filter  output  is: 

fn(t)  •  f(t)  *  gn(t),  (A3-18) 

which  is  expanded  into: 

t 

fit)  =  /  fU)h(t-X)  cos  (V(t-X)]dX  (A3-19) 

n  *  n 


or 


t 

f„(t)  »  Re(ejwnt  ^ 
n  -« 


f(X)h(t-X)e'jwnxdX 

) 


(A3-20) 


where  the  integral  in  (A3-20)  is  the  short-time  Fourier  transform. 

The  transform  can  be  expressed  in  terms  of  its  amplitude  and  phase  as 


fn(t)  =  |  F(w  ,t)  |  cos  [w  t  +  ip(w  ,t) ]  , 


( A3-21 ) 


Figure  A3-10.  Filtering  of  speech  by  adjacent 
band-pass  filters. 

(Taken  from  ref.  35,  p.  1494) 
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where  F(wn,t)  is  the  complex  short-time  transform  and  ip(wn,t)  is  the 
short-time  phase  spectrum.  |  F(wn,t)  |  can  be  bandlimited  to  20  or  30 
Hz  without  perceptual  distortion.  ^(wn,t)  is  unbounded  and  therefore 
not  suitable  for  transmission.  The  time  derivative,  $(wn,t),  is  com¬ 


puted  for  transmission.  Now  the  signal  can  be  approximated  as: 


fn(t)  =  |  F(wn,t)  |  cos  [wpt  +  fawn,t)]. 


(A3-22) 


where 


$(wn,t) 


/'i»(wn,t)  dt. 


( A3-23) 


(A3-23)  shows  how  (wn,t)  is  recovered  to  within  the  value  of  an  addi¬ 
tive  constant.  Because  the  ear  is  relatively  insensitive  to  phase 
this  phase  error  constant  poses  no  serious  problem. 

The  synthesizer  reconstructs  the  signal  by  summing  the  outputs 
of  n  oscillators  modulated  in  phase  and  amplitude  from  bandlimited 
versions  of  j  Ffw^.t)  |  and  4>(wn ,t) .  This  process  is  shown  in  Fig¬ 
ure  A3-11. 

A  computer  implementation  is  shown  in  Figure  A3-12.  The 
mathematical  process  utilizes  the  real  and  imaginary  components  of  the 
complex  spectrum 


where 


F(wn,t)  =  a(wn,t)  -  jb(wn,t)  , 


a(w  .t)  =  / f ( X ) h(t-X)  cos  (w  x)dx 

n  n 


( A3-24) 


(A3-25) 


and 


( A3-26) 


t 

b(wn,t)  =  J  f(X)h(t-X)  sin  (wnX)dX 


Now  the  amplitude  and  phase  derivatives  are  given  by: 

|  F(wn,t)  |  =  (a2  +  b2)1/2 
and 


(A3-27) 


i(w_,t)  =  (^j) 
n  a^+b* 


(A3-28) 


For  computer  implementation  these  equations  are  given  by: 

m 

a(w  ,t)  =  T  T  f ( 1 t)[cos  (w  1 t)]h(mT-lT)  (A3-29) 

n  1=0  n 

and 

m 

b(w  ,t)  =  T  T  f(lt)[sin  (w  1 t) ]h( mT-1 T)  ,  (A3-30) 

n  1=0  " 

where  T  is  the  sampling  interval.  The  derivatives  are  computed  as: 

Aa  =  a[wn»(m+l)T]  -  aCw^.mT]  (A3-31) 

and 

Ab  =  b[wn , ( nH-l )T3  -  b[wn,mT]  .  (A3-32) 

The  magnitude  and  phase  derivatives  in  discrete  form  are: 


The  computer  resynthesis  for  one  channel  is: 


n  m  A^(w  ,1 T) v 

f(mt)  =  |  F(wn,mT)j  cos  (wnmT+Ti|0 - 7 - 


(A3-35) 


This  is  summed  for  all  channels  to  recover  the  speech  signal.  This 
method  eliminates  the  need  for  pitch  and  V/UV  decision  extraction. 

Linear  Predictive  Coding  Vocoders 

Linear  Predictive  Coding  (LPC)  is,  at  this  time,  the  most  com¬ 
mon  method  of  vocoder  implementation.  The  LPC  process  is  usually 
based  on  autocorrelation  analysis.  The  speech  signal  is  highly  repe¬ 
titive  and  redundant  in  its  features.  As  indicated  previously,  the 
vocal  tract  is  a  slowly  varying  system;  this  is  what  gives  rise  to  the 
repetitiveness  of  speech.  This  characteristic  is  easily  exploited,  in 
that  adjacent  segments  of  the  signal  are  highly  correlated.  This 
means  that  given  n  past  samples  of  the  waveform  the  next  segment  can 
be  predicted  with  generally  a  high  degree  of  accuracy.  Increasing  £ 
tends  to  improve  the  prediction  accuracy.  These  predictors  are 
weighted  values  recomputed  every  20-40  ms.  This  type  of  analysis  is 
generally  attempting  to  model  the  vocal  tract.  The  most  general  form 
of  this  model  is: 
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1+l=lblZ  1 

H  ( z )  =  G  — — - -  .  ( A3-36) 

1-  f  V* 

k=l  K 

It  has  been  found  (95)  that  it  is  sufficiently  accurate  to  allow  the 
numerator  of  H(z)  to  equal  G,  a  constant  gain.  This  results  in  an 
al  1  -pol  e  model . 

In  general  terms  an  LPC  vocoder  generates  the  predictor  coef¬ 
ficients.  The  coefficients  are  then  used  in  a  correlation  procedure 
and  matched  with  the  incoming  segment  of  speech  just  predicted.  This 
results  in  the  generation  of  an  error  signal.  The  next  set  of  predic¬ 
tors  are  generated  in  order  to  minimize  the  error. 

In  LPC  systems,  speech  is  modeled  by  an  all  pole  filter,  H(z). 
Thw  filter  transfer  function  is  given  by: 


H(z) 


(A3-37 ) 


The  frequency  domain  and  time  domain  models  for  this  process  are  shown 
in  Figure  A3-13.  This  model  assumes  that  a  frame  or  window  of  speech 
can  be  expressed  as: 

s(n)  =  l  a.s(n-k)+Un  (A3-38) 
k=l  ’ 


where  p  is  the  number  of  poles,  Un  is  the  appropriate  input  excita¬ 
tion,  and  the  a^'s  are  the  predictor  coefficients  characterizing  the 
filter.  To  generate  the  speech  signal  knowledge  of  the  pitch,  filter 


parameters,  and  the  gain  of  the  filter  (amplitude  of  input  excitation) 
in  each  frame  is  needed  (11).  Filters  with  various  numbers  of  poles 
are  used,  with  eight  to  sixteen  being  the  most  common. 


In  (A3-38),  Un  is  zero  except  for  one  sample  at  the  beginning 
of  each  pitch  period.  Thus,  the  equation  becomes: 

s(n)  =  l  a.s(n-k)  .  (A3-39) 

k=l  K 

Now  if  the  model  were  perfect,  the  speech  samples,  S(n),  would  be 
completely  predictable.  This  perfect  model  does  not  exist,  therefore, 
it  is  necessary  to  define  an  error,  E(n),  between  s(n),  the  sampled 
speech,  and  s{n),  the  predicted  speech.  This  error  is  given  as: 


E(n)  =  s(n)  -  s(n)  *  s(n)  -  f  a,s(n-k)  . 

k=l  K 

The  mean  square  error  is  given  by: 


(A3-40 


E,  =  <E(n)2>  =  l  [ s( n)  -  \  a.s(n-k)]2  .  (A3-41) 

1  n=l  k=l  K 

The  a^'s  are  chosen  so  as  to  minimize  this  error.  This  can  be  done  by 
computing: 


3<E(n)S 

5at 

J 


1,  2,  ...  ,  p  , 


yielding  the  set  of  equations: 


( A3-42 ) 


□  «d  as 

I  \  l  s( n-k > s( n-j )  =  l  s(n)s(n-j),  j  =  1,  2 . .  . 


k=l  n*l 


(A3-43) 


The  right  side  of  this  equation  constitutes  an  autocorrelation  func¬ 
tion,  R.,  and  { A3-43)  can  be  expressed  as: 

J 


f  a.  R(  j-k)  =  R 
c=l  K  J 


,  1  <  j  <  p. 


This  set  of  equations  can  be  solved  recursively  as  follows: 


E0  =  R0 


.  -(rj  *  £a"(J'1)Rj-R> . 


a<J>.  k., 
J  J 


( A3-44) 


(A3-45a) 


(A3-45b) 


a(j)  =  a( J-1)  +  k  a(j-l), 
n  n  j  j-k 


1  1  n  <_  j-1. 


(A3-45c) 


ej  ■  (,-k?Ej-i  • 


(A3-45d) 


The  final  solution  is  given  by: 


*n  *  “‘S’  •  E  <  »  <f>  • 


(A3-45e) 


The  k/s  are  reflection  coefficients  (or  partial  correlation  coef¬ 
ficients).  By  expanding  the  squared  terms  in  (A3-41)  and  using 
(A3-44),  the  minimum  error  is  given  by: 
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EP  ■  *0  *  •  <«-«> 

Given  the  above  derivation  for  error  and  predictor  coefficients, 
alternate  sets  of  transmission  parameters  can  be  derived.  Given  below 
is  a  list  of  the  possible  sets  of  parameters  for  characterizing  uni¬ 
quely  the  linear  prediction  filter  H(z)  (77).  The  denominator  of  H(z) 
is  an  inverse  filter  A(z)  given  by: 

A(z)  =  1  +  V  atz_k  .  (A3-47) 

k=l  K 

The  various  parameters  suitable  for  transmission  are  the: 

1.  Impulse  response  of  A(z),  i.e.,  the  predictor  coefficients 
V  1  i  k  £  P» 

2.  Impulse  response  of  the  all -pole  model,  h^,  0  <  k  <  p, 
easily  obtained  by  long  division, 

3.  Autocorrelation  coefficients  of  ( a^/G)  given  by: 

1  P-  |  1  | 

bn  =  lsQ  anVj  .  a0  s  1,  0  <  J  <  P.  ( A3— 48) 

4.  Autocorrelation  coefficients  of  (hk)  (partial  correlation 
coefficients)  given  by: 

rJ  ■  |0hnVj  •  0  i  J  i  P  •  <«-«> 

r.  *  R.  in  (A3-44)  for  0  <  j  £  p  , 


where 
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5.  Spectral  coefficients  of  A(z)/G,  P.,  0  <_  j  £  p  (or  equiva- 

J 

lently,  spectral  coefficients  of  H{z),  1/P.  given  by: 

J 

Pj  -  bQ  +  2  J  bn  cos  ,  0  £  j  £  p  (A3-50) 

(this  results  in  a  Linear  Prediction  Channel  Vocoder), 

6.  Cepstral  coefficients  (log-area-ratio  coefficients)  of 
A(z)»  c^,  1  <  k  <  p  (or  equivalently  cepstral  coefficients  of  H(z)/G, 

-Ck  given  by: 


Ck  -  ^  /  log  A(ejw)ejnwdw, 


or  in  digital  form 


(A3-51) 


(A3-52) 


7.  Poles  of  H(z)  (or  zeros  of  A(z)), 

8.  Reflection  coefficients,  K.,  1  £  j  £  p  or  simple 

J 

transformations  thereof,  i.e.,  area  coefficients  given  by: 


\  =  Ak+1  YTT  *  Vl=1  ’  1  -  j  -  P  (A3-53) 

(The  reflection  coefficients  are  an  intermediate  product  of  the  error 
minimization  but  may  be  computed  directly  by: 
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a(j)  _  ,(j)  a(j) 

•  1= _ *"  ,  1  <  „  <  j-1  W-55> 

n  i  — 


where  j  takes  the  values  p,  p-l,...,l.  Initially 


a<p)  ■  an,  1  <  j  <p  .)  .  (A3-56) 


and 

9.  The  error  coefficients.  (If  the  analysis  and  synthesis 
systems  are  guaranteed  to  start  at  the  same  state,  a  measurement  of 
the  signal  matching  error  can  be  used  to  adjust  the  synthesizer.) 

Figure  A3-14  shows  a  block  diagram  of  a  linear  predictive 
coding  system.  Figure  A3-15  shows  a  pole-zero  predicter  block  dia¬ 
gram.  The  zero  portion  is  not  Implemented.  Pitch  and  V/UV  decisions 
must  also  be  supplied  by  the  analyzer.  Data  transmission  rates 
currently  vary  from  about  1.2  kbps  to  10  kbps  depending  upon  implemen¬ 
tation. 

Spectral  Eve! ope  Estimation 

The  Spectral  Envelope  Estimation  (SEE)  vocoder  is  similar  to 
the  homomorphic  vocoder  In  that  it  utilizes  the  log  magnitude  of  the 
Fourier  transform.  This  Is  shown  in  Figure  A3-16.  It  starts  with  the 
assumption  that  speech  can  be  modeled  by: 

S(f)  =  E(f ,T)V(f)  ,  (A3-57) 

where  £(F,T)  is  a  unity  amplitude  impulse  train  of  period  1/T  (the 
pitch  period)  and  |  S(f)  |  Is  a  sampling  of  the  vocal  tract  response 
|  V(f)  J  at  the  points  f*k/T  for  k»l,2,...  (99).  Interpolating 
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Figure  A3-16.  Spectral  analyzer  structure. 
(Taken  from  ref.  99,  p.  787) 


SEARCH  ZONE 


between  the  sample  points  |  S ( k/T )  |  will  provide  the  spectral  enve¬ 
lope  estimation,  |  7(f)  |  .  If  enough  samples  are  taken,  i.e.,  if 
|  V(f)  |  is  reasonably  smooth  relative  to  1/T,  the  envelope  estimate 


will  approximate  the  ideal  vocal  tract  response. 


|  V(f)  |  s  |  T(f)  |  . 


( A3-58) 


In  order  to  use  |  7(f)  |  as  the  estimate,  the  location  of  the 
samples,  |  S( k/T)  |  ,  must  be  reasonably  accurate.  To  locate  the  peaks 
of  |  S(f)  |  ,  the  speech  is  assumed  stationary  within  a  frame  or  win¬ 
dowed  segment  of  the  speech  wave.  The  procedure  for  finding  these 
peaks  is  shown  in  Figure  A3-17.  The  system  samples  the  pitch  detec¬ 
tion  signal  and  maintains  a  short-term  average  of  the  pitch,  F^,  a 
characteristics  of  each  individual  speaker.  This  average  is  given  by: 


0  Avg.  Pitch  Period  * 


k=l  and  Fq  =  0  . 


(A3-59a) 


(A3-59b) 


Next  the  windowed  segment  of  )  S(f)  |  is  searched  for  f^  such  that 
S(f)j  is  maximized  where: 


Vi*  \  f  i  Vi *  7  <*?• 


(A3-59c) 


sCjc-VJV. 


This  is  repeated  until  the  entire  segment  is  covered.  If  there  is  a 
significant  difference  between  the  talker's  pitch  and  the  average 

pitch,  confusion  could  occur  between  the  peaks  of  the  spectral  lines 
and  the  spurious  si  delobes  caused  by  windowing  the  speech. 


Once  the  sampling  is  completed,  the  estimate  of  the  entire 
waveform  of  |?(f)  |  is  needed.  This  is  accomplished  by  interpolating 
between  the  samples  |  S(f^)  j  .  Since  a  talker's  pitch  varies  slowly 
only  over  approximately  an  octave,  the  average  pitch  is  assumed  to  be 
reasonably  accurate  so  that  all  samples  are  at  or  near  the  desired 
peaks  and  linear  interpolation  between  the  samples  can  be  used. 

Now  the  spectral  envelope  estimation  is  the  estimate  of  the 
vocal  tract  (filter)  response  given  as: 


|  S(f)  |  =|  H(f )  |  , 


(A3-60) 


and|3T(f)  I  can  be  found  from 


log  | 

0<f<f 

log  | 

U- 

co 

log  |  3-(f)  |  =  i"  109  I  S(fk-1)  I  + 


(A3-61 ) 


log|S(fk)|  ,  k>l 

V  fk-l 


The  above  derivation  of  the  spectral  envelope  estimator  is 
based  on  the  assumption  that  the  harmonics  of  the  periodic  impulse 
source  sample  the  frequency  response  of  the  vocal  tract  filter. 

The  spectral  envelope  estimator  can  also  be  viewed  as  an  adaptive 
channel  (filter  bank)  vocoder  analyzer  or  an  improvement  on 
the  homomorphic  spectral  analyzer.  Each  iteration  of  the  spectral 


line-finding  heuristic  selects  the  point  in  the  short-term 
(windowed  OFT)  spectrum  centered  on  the  next  spectral  line.  Since 
each  point  of  a  short-term  spectrum  is  equivalent  to  the  output  of 
a  filter  whose  characteristics  are  determined  by  the  time  window, 
the  procedure  selects  an  analysis  filter  bank  that  has 
exactly  one  filter  per  spectral  line  with  each  filter  centered  on 
the  corresponding  spectral  line.  The  linear  interpolation  between 
spectral  points  essentially  only  creates  a  smooth  (spectral) 
waveform  so  that  the  coder  need  not  know  about  or  transmit  the 
locations  of  the  spectral  lines  (99). 

The  system  must  also  provide  the  actual  pitch  and  the  V/UV 
decision  for  transmission  along  with  the  SEE.  Figure  A3-18  shows  the 
block  diagram  of  an  SEE  Vocoder. 
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